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This  constitutes  our  final  report  on  a research  program  aimed  at  the  development 
of  a high  quality  low  data  rate  speech  transmission  system  based  on  new  types  of 
speech  modeling  algorithms.  Several  such  algorithms  were  developed  and  tested  on 
simulated  and  real  speech  data.  These  algorithms  have  many  desirable  features 
including  the  capability  of  rapidly  tracking  time-varying  model  parameters. 

The  best  algorithm  was  used  as  the  basis  of  a speech  transmission  system  in  order 
to  test  the  quality  of  the  speech  models.  The  model  parameters  (reflection 
coefficients)  together  with  pitch  information  and  speech  energy  form  a speech 
parameter  vector  to  be  transmitted  and  used  to  reconstruct  the  original  speech. 
Several  parameter  quantization  methods  were  considered  to  achieve  the  desired  low 
bit  rates. 

The  various  algorithms  as  well  as  the  complete  transmission  system  were  coded 
and  tested.  Simulation  results  are  very  promising  and  the  usefulness  of  our  new 
approach  for  speech  modeling  has  been  successfully  established. 
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1.  Introduction 


This  investigation  was  concerned  with  the  ‘development  of  digital  voice 
communication  systems  capable  of  low  data  transmission  rates.  Such  systems 
construct  a time-varying  linear  model  of  the  speaker's  vocal  tract,  and  transmit  the 
encoded  model  parameters  over  a digital  network.  The  receiver  reconstructs  the 
model  from  the  coded  parameters  and  synthesizes  an  approximation  of  the  original 
speech  signal.  The  model  is  traditionally  constructed  by  the  method  of  linear 
predictive  coding  (LPC),  which  predicts  future  speech  samples  as  linear 
combinations  of  past  speech  samples,  where  the  linear  combination  is  chosen  to 
minimize  the  prediction  error.  This  results  in  a vector  A of  coefficients  which 
characterize  the  speech  production  mechanism  in  terms  of  an  inverse  filter  a(z)  (the 
vocal  tract  is  considered  the  filter  l/«(z)  ).  The  A coefficients  can  be  used  to 
reconstruct  the  speech  signal.  In  practice  they  must  be  encoded  to  achieve  low  bit 
rates,  but  this  problem  can  be  separated  from  the  modeling  problem  proper. 

Our  goal  was  to  investigate  the  application  of  new  linear  estimation  algorithms 
to  speech  modeling.  This  involves  both  modeling  and  encoding  issues.  There  are  a 
large  number  of  ways  to  approach  the  speech  modeling  problem,  but  here  we 
restrict  our  attention  to  exact  least-squares  linear  estimation  procedures  (there  is 
currency  no  reason  to  examine  sub-optimal  or  approximate  methods).  These 
estimation  methods  find  linear  models  which  fit  the  (speech)  data  optimally  in 
terms  of  minimizing  the  sum  of  the  squared  errors— hence  the  term  "least-squares". 
The  LPC  methods  currently  used  in  speech  modeling  are  least-squares  estimation 
procedures  which  find  all-pole  or  autoregressive  (AR)  models.  The  assumption 
that  an  all-pole  model  is  sufficient  is  valid  for  vowel  sounds  (disregarding  sound 
radiation).  However  nasal  sounds  require  zeros  and  a pole-zero  or 
autoregressive-moving  average  (ARMA)  model  should  produce  a more  efficient 
speech  encoding.  Another  aspect  of  our  modeling  effort  is  the  extensive  use  of 
ladder-form  realizations  and  their  reflection  coefficient  K for  speech 


parametrizations.  The  K coefficients  have  many  advantages  over  other  model 
parametrizations,  such  as  better  numerical  properties  and  fast  convergence,  and  can 
be  used  directly  in  a ladder-form  synthesis  structure.  Each  representation  can  be 
converted  to  the  other  in  a one-to-one  fashion,  but  the  K coefficients  have  physical 
significance  in  speech  modeling  because  they  correspond  to  acoustic  reflection 
coefficients  in  a segmented  tube  model  of  the  vocal  tract.  Actually,  it  is  possible  to 
modify  the  Levinson  recursion  to  avoid  the  use  of  the  prediction  parameters  A 
when  computing  the  reflection  coefficients,  as  mentioned  in  [MLNV].  See  also 
[Vieira]. 

Various  new  speech  modeling  algorithms  were  developed  using  the  techniques 
associated  with  our  fast  algorithms: 

- Pre- windowed  ladder-form  (AR) 

- Covariance  ladder-form  (AR) 

- Pole-zero  ladder-form  (ARMA) 

All  of  these  techniques  use  exact  recursive  least-squares  parameter  estimation 
algorithms,  i.e.  they  are  ideal  on-line  modeling  methods,  with  fast  "adaptive" 
properties.  Their  computational  requirements  (per  sample)  are  proportional  only  to 
n,  the  number  of  model  parameters  - again  a feature  that  is  well  suited  for 
hardware,  parallel  processing  or  pipeline  implementation  of  our  algoritljgis. 

The  implementation  of  the  pie-windowed  (PW)  ladder-form  has  been  enhanced 
with  the  introduction  of  tracking  of  time-varying  parameters.  This  can  only  be 
done  with  an  on-line  method  and  meshes  naturally  with  the  PW  ladder 
formulation.  The  effect  is  that  the  parameter  estimates  track  the  actual  dynamics 
by  weighting  recent  data  more  heavily  than  older  data.  The  dynamics  in  model 
order  can  also  be  tracked.  These  tracking  capabilites  are  necessary  for  estimates  of 
transients  (or  transemes)  or  plosives.  The  weighted  forms  of  the  covariance  ladder 
and  the  rational  ladder  algorithms  were  developed.  We  discovered  that  the  tracking 
ability  of  our  algorithm  is  actually  even  better  for  voiced  speech,  as  the  glottal 
pulses  help  the  parameters  to  converge  within  a few  samples  virtually  to  a constant 
over  a pitch  period  - a fact  that  leads  to  reduced  data  rates.  For  unvoiced  speech  (i.e. 


Gaussian  type  residuals)  the  tracking  is  still  fast  but  very  smooth  - another 
desirable  feature. 

This  PW  ladder-form  algorithm  provides  the  basis  for  our  digital  speech 
transmission  system.  The  system  consists  of  a speech  analyzer  which  produces  a 
(slowly)  time  varying  parameter  vector,  an  encoder  that  converts  a single  frame  of 
speech  parameter  vectors  into  a binary  data  stream,  and  a decoder  that  converts  the 
binary  stream  into  parameter  vectors  which  are  used  by  the  speech  synthesizer  to 
reconstruct  a signal  that  sounds  like  the  original  speech.  The  speech  parameter 
vector  consists  of  the  reflection  coefficients,  the  pitch  period  (or  time  index  of  the 

K j 

. beginning  of  the  pitch  period),  and  the  energy  contained  in  the  speech  frame  (or 

j 

equivalently  in  the  residuals).  The  pitch  information  is  obtained  by  a novel  pitch 

: i 

detection  method,  resulting  from  our  recursive  ladder-form  algorithm,  using  a log 

*1 

likelihood  ratio  parameter  that  is  computed  by  the  algorithm  in  order  to  separate 

I 

out  the  jump  process  type  pitch  pulses  from  the  residuals. 

Several  quantizing  methods  were  considered,  for  moderate  bit  rates,  (e.g.  4800  - 
9600),  single  symbol  quantization,  i,e.  independent  quantization  of  each  parameter 
is  considered  sufficient.  For  lower  (e.g.  1200  and  below)  rates,  a new  parameter 
vector  quantization  scheme  based  on  a minimum  distortion  measure  was 
considered.  Such  methods  are  being  developed  by  R.  M.  Gray  at  Stanford  (under 
AFOSR  sponsorship).  (See  [Buzo].)  These  new  quantization  schemes  are  still  in  the 
development  stage;  however  they  are  sufficiently  promising  so  that  a short  sample 
speech  segment  was  quantized  with  approximately  3 db  Itakura-Saito  rate 
distortion  deviation  ( from  the  unquantized  reconstruction  ) at  a rate  of  roughly 
700  bits  per  second.  In  this  new  method  the  parameter  space  of  the  reflection 
coefficients  is  partitioned  into  a number  of  cells.  Whenever  the  parameter  vector 
falls  within  a given  cell,  the  binary  number  representing  that  cell  is  transmitted. 
The  partitioning  of  the  parameter  space  is  chosen  so  as  to  minimize  a given,  e.g. 
Itakura-Saito,  distortion  measure.  In  the  actual  (on-line)  quantization,  a 
mean-square  error  (Euclidean  distance)  criterion  is  used  to  pick  the  actual  code 
transmitted.  These  methods  have  great  potential  to  provide  high  quality  low  bit 
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rate  digital  voice  encoding  for  the  future.  In  the  simplest  case,  pitch  period  and 
(log)  energy  or  gain  are  envisioned  to  be  coded  via  a standard  delta  modulation  type 
scheme;  however,  this  is  only  considered  in  order  to  be  able  to  compute  an 
achievable  lower  bound  on  the  transmission  rate.  A real  implementation  could  use 
more  sophisticated  coding  schemes,  a task,  beyond  the  scope  of  this  research.  Using 
rate  distortion  encoding  schemes,  a given  (low)  rate  can  be  achieved  with  a 
minimal  loss  in  speech  quality,  once  a suitable  distortion  measure  (such  as  the 
Itakura-Saito)  has  been  agreed  apon.  A number  of  simulations  were  performed  on 
the  complete  transmission  systems  and  the  results  so  far  are  very  promising. (See 
Section  5.) 

This  final  project  report  presents  the  theoretical  results  of  our  research,  the 
actual  algorithm  implementations,  and  the  simulation  results  of  the  first  year  of  an 
originally  estimated  two  years  worth  of  research.  Sections  E and  3 present  all  the 
analytical  work  that  was  performed.  Section  4 and  Appendix  C discuss  the 
software  generated  as  a result  of  our  analysis,  and  Section  5 describes  the  actual 
simulation  results. 
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2.  Algorithms  for  Speech  Modeling 

A number  of  new  algorithms  for  speech  modeling  were  developed  in  the  course 
of  our  investigation,  using  our  fast  algorithm  approach  to  estimation  and  system 
identification.  The  main  features  of  these  algorithms  are 


- They  are  exact  least-squares  methods 

- They  are  recursive  in  time  and  in  the  order  of  the  model  and 
thus  capable  of  processing  data  as  it  comes  along  (i.e.  not 
block-by-block,  unless  desired ) 

- Compute  directly  either  the  predictor  coefficients  ( AR  model) 
or  the  reflection  coefficients  ( ladder  form);  provide 

true  unbiased  estimates 


- Capability  of  tracking  time-varying  model  parameters 

- Good  stability  and  convergence  properties 

■ 

- Computationally  efficient 

In  this  section  we  describe  in  detail  the  development  of  several  algorithms.  For 
several  reasons,  the  emphasis  is  on  the  development  of  ladder-form  realizations.  In 
particular  the  reflection  coefficients  appearing  in  these  forms  turn  out  to  be  an 
excellent  way  of  parametrizing  speech.  Both  autoregressive  (AR)  and  pole-zero  or 
autoregressive  moving-average  (ARMA)  type  algorithms  are  derived. 

Sections  2.1  - 2.4  present  the  detailed  derivation  of  the  algorithms  for  the 

. 

pre-wlndowed  and  non-windowed  (covariance-form)  ladder-forms.  The 
pre- windowed  versions  of  these  algorithms  play  a central  role  in  our  Investigation, 
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and  Section  2.3  discusses  the  necessary  modifications  to  make  them  capable  of 
tracking  time  varying  parameters.  Pole-aero  or  autoregressive  moving-average 
(ARMA)  algorithms  are  derived  in  Section  2.5  . 

The  computational  requirements  of  various  algorithms  are  summarized  in 
Section  2.6  and  compared  to  currently  used  methods.  Finally,  some  difficulties 
which  arose  during  the  algorithm  implementation  phase  and  the  method  by  which 
they  were  rectified  are  briefly  described  in  Section  2.7  . 

For  an  overview  of  the  various  algorithms  derived  in  this  section  and  the  way 
they  are  related  to  each  other  - see  Appendix  A.  The  importance  of  ladder  form 
realizations  in  estimation  and  modeling  is  briefly  summarized  in  Appendix  B. 


In  this  section  we  introduce  a framework,  which  is  later  used  for  developing 
several  exact  least-squares  algorithms  for  AR  - type  models.  The  basic  problem  is 
presented  from  an  estimation  theory  approach  which  leads  to  the  deterministic 
least-squares  framework. 


2.1.1  Basic  Problem 

We  model  speech  as  the  output  of  a time-varying  linear  system,  which,  over  a 
short  time  interval,  can  be  approximated  by  a time-invariant  filter  of  the  form 

y{z)  - H( z)  u(z)  , (1) 

with  y(z)  being  the  z-transform  of  the  discretized  speech  signal,  H{z)  the  overall 

transfer  function,  and  u(z)  the  input  driving  function  which  consists  of  a periodic 

pulse  train  (approximating  the  glottal  pulses  for  voiced  sounds),  and  zero-mean 

white  noise  (for  unvoiced  sounds).  Such  a model  is  widely  accepted  by  the  speech 

research  community  as  a good  description  of  the  speech  generation  process. 

Detailed  discussions  on  this  model  can  be  found  in  [MKD]  and  [Fla].  A particularly 

popular  model,  see  e.g.  [MG],  is  one  where  H{z)  is  a finite  order  all-pole  filter 

P 

z)  - 1/(1+  2 APik)  z'k) 

1-1 

■ 1 / Ap{z)  . (2) 


Such  a model  is  equivalent  to  modeling  { y(>) } as  an  autoregressive  (AR)  process 

h + V^i-i  + • • • + Ap{P)yt.p  m ut  ■ (3) 

Rewriting  (3)  as 


P 

y m - V Ap®  yt  It  + Uj  , 

Jfc-1 

the  all-pole  filter  of  (2)  forms  a one-step  linear  predictor  for  { y(-) } 

P 

ii[i-i, t-p] " • 2 Ap{k)  yt-h  • 


(4) 


(5) 
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A very  common  and  practical  criterion  is  to  choose  the  predictor  that  minimizes 
the  squares  of  the  prediction  errors,  thus  leading  to  a least-squares  estimation 
problem. 

We  assume  for  the  moment  that  { jK*) } (for  generality  we  let  yt  be 
m -dimensional  vectors)  is  a stationary  process  with  known  covariance  m by  m 
matrix  function 


E yty\-k  - Rk  • 

R_k  - Rf  , k - 0,  I P (6) 

The  innovations  or  the  forward  prediction  errors  are  then  given  by 

ip,t  m yt  - iiii-i,  t-pj 

- yt  + V1J  + • • • + VP)  y%-P' 

- AJp  y[t:t-p]  . 

with 

A!p  - [ lm  , Ap^\  ....  Ap(P)  ] , (m  by  ( P+\)m  ) 

y[t:,-P]  m C V > >MT VPT  3 ■ ^ by  (P+l)m  ) 

I - m by  m identity  matrix.  (8) 

The  innovations  should  satisfy  the  following  orthogonality  property 

* ?JfcT  " 0 • t-PskSt-  1 (9) 

E <P,t  «p/  “ £ «P,i  V 

- (10) 

From  (9)  and  (10),  we  see  that  -Ap  can  be  obtained  as  the  solution  of  the  linear 
matrix  equation  called  the  Normal  Equation 


R 


P 


0 

0 

0 

0 


(11) 


where  Rp  is  the  (P+  l)m  by  (P+l)m  block  Toeplitz  matrix 


(12) 


RP  " E Y[tit-P ] ^[ttt-P] 


Writing  it  out  in  full 


R0  R j 
^-1  *0  Rl 


. . R, 


'•P-l 


^-2  ^-1 


P 


M 


• • Rr 


(13) 


We  note  that  by  virtue  of  stationarity  of  { y{') } that  R ip  is  independent  of  r and 
that  the  covariance  matrix  Rp  is  Toeplitz.  This  special  structure  makes  it  possible 
to  solve  the  normal  equation  (11)  with  fewer  than  the  0{P 3)  computations 
generally  necessary  to  solve  P linear  equations  in  P unknowns.  In  fact,  Levinson 
[Lev]  and  then,  for  vector  processes,  Whittle  and  independently  Wiggins  and 
Robinson  [WR],  derived  a scheme  for  solving  (11)  with  0{P2)  computations.  (Here 
a computation  is  taken  as  one  multiplication  of  two  real  numbers.)  This  algorithm, 
which  we  call  the  LWR  algorithm,  involves  the  simultaneous  solution  of  ( 1 1)  and 
an  auxiliary  equation 

Rp  Rp  • [ 0,  0,  . . . i Rr p ] ^ t 

where 

BJp  - [ Bp<P)  , Bp(p-1] Spa),  Im  ].  (14) 

and  is  actually  a backwards  predictor,  with  backwards  prediction  errors  defined  by 

A 

rp,t  - yt-p  - yt-p\[t-PA,  t] 

- BpWyt  * ...  * Sp(1)  yt_p.i  + yt.p  . 

" BJP  y[t:t-P ] • <15> 

and 

E rP,t  yk  “ 0 • t-P  + l skst 


(16) 


i I 


rP,t  rP,t 


T - E r 


p,t  yt-p 


- Rrt 


The  basic  idea  of  the  LWR  algorithm  is  to  compute  Am  and  Bm  recursively  in 
order  from  n - 0 to  P . Here  we  give  the  recursiona  of  the  algorithm,  and  a detailed 
discussion  can  be  found  in  [X-74]  and  [Vie]. 


2.1.2  LWR  Algorithm 


Iterate  on  n-0  until  n»P 


n+1 


An+1  Bn+1 


Rr 


n*  1 


Ln+1 


B_ 


*«+  1 


6 


n+l 


(17) 


where 


6 


n+1  “ 


m 


D-<  a T 
R n ^n+1 


-R'\  A 


n ^/i+l 


m 


(18) 


n-1 


^n+l  “ Z *n*  ^ Rk  * & ^ *n,i  J’Vn-l  ^ m & C <n(,  rTn,t_i  3 
Jt-0 


(19) 


with  initial  conditions: 

R* q ■ Rr§  m Rq  i 


A0  " *0  - !m 


(20) 


The  An+J  defined  in  (19)  is  known  as  the  partial  correlation  coefficient 
(PARCOR)  between  the  forward  and  backward  prediction  errors,  and  when 
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J 


appropriately  normalized  by  Rtn  and  Rrn,  they  become  the  so-called  reflection 
coefficients. 


! . 


■ i 


L 


A compact  way  to  rewrite  the  algorithm  is 


’ R'p 

0 

*0 

V 

Im 

0 

Ap 

■ 

0 

0 

Rrp  _ 

. *1 

*0 

91  ®2  ®3 


8, 


(21) 


The  important  point  which  (21)  brings  out  is  that  both  Ap  and  Bp  can  be 
completely  characterized  by  { 6n  , n - 1,  . . . , P },  and  therefore  by 

{ Rq  ; Afl  , n - 1 P }.  It  turns  out  that  this  parametrization  of  the  predictor 

offers  many  advantagjs  such  as  "stability  by  inspection"  property"  [MLVN]  and 
they  form  the  basis  for  implementing  the  predictor  in  ladder  forms.  The 
development  of  various  ladder  forms  will  be  presented  in  the  subsequent  sections. 


In  real  time  applications,  no  ensemble  averages  are  available.  The  covariance 
functions  usually  are  not  directly  obtainable  and  must  be  approximated  by  time 
averages  from  the  given  data.  This  leads  to  the  following  deterministic 
least-squares  problem. 


2.1.5 


r 

r 


2.1.3  Deterministic  Least-Squares  Problem 
Given  a series  of  observations  { y(t),  S S t iT  },  where  { y(>) } can  be 
m -dimensional  vectors,  we  wish  to  find  the  linear  one-step  predictor  of  order  P, 
parametrized  by  the  (matrix)  predictor  coefficients  { -Apgf®,  M, ....  P } , that 
minimizes  the  sum  of  the  square  of  prediction  errors  t pgjt  t) , where 

m h " . . . , t-P 

m yt  + ap,s,t yt-i  + • • • + ap£,t{p)  yt-p>  s st  st.  (22) 

In  matrix  notations,  we  have 

" A1P£,T  y[r.t-P]  • (23) 

with 

A1P£,T  m [ 1m  • AP,S,T{1) AP,S,T{P)  ] • 

y[ut-p]  ■ £ y^  • y^ i vpT  ^T-  (24) 

We  can  express  the  squared-error,  t pgj , by 

/ 

tp£,T  ’ rr(  S.  ip&lt*)  (JPE,^  ) 

(25) 

(26) 


(27) 

It  is  well-known  in  least-squares  theory  that  the  Apsp  that  minimizes  f ptstT 
is  obtained  by  solving  a linear  matrix  equation  called  the  Normal  Equation  of  the 
following  form 
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rp&t  ap,s,t 


P,S,T 

0 

0 

0 

0 


where 


tr  R(P£,T  “ min  A *P£,T 


(28) 


(29) 


The  solution  for  Apgp  in  (2S).  f°r  the  scalar  case,  would  Involve  the  inversion 
of  Rpts,T  • an  n b 71  matrix,  and  thus  would  require  0(n3)  computations. 
However,  when  R p$j  carries  some  shift-invariance  structure,  for  example  a 
Toeplitz  matrix,  a reduction  of  computations  is  achieved.  In  the  case  where  Rp 
is  Toeplitz,  equation  (28)  can  be  solved  via  the  Levinson  algorithm  [MG],  [MVLK], 
requiring  only  0(n2)  computations.  The  basic  approach  is  to  build  up  the  predictor 
recursively  in  order,  i.e.  by  recursively  obtaining  {^P)s,r>  P ■!.•••.  P } • 

In  our  present  problem,  the  structure  of  Rps,T  depends  on  the  choice  of  i and  /. 
Here  we  consider  the  following  three  cases  of  importance: 

(1)  i-$,/-7\ 

This  is  called  the  " pre-windowed " case,  since  one  has  to  make  the  assumption 
that  y(t ) - 0,  for  t < S.  Thus  ^ ' P S T becomes 

ys  • • • ys*p  - • ■ >r 


P,S,T  " 


ys 


yr-p 


(30) 


(2)  + 

This  is  called  the  "non-windowed"  case,  i.e.  no  windowing  is  applied  to  the 
observed  data.  This  is  also  known  as  the  "covariance"  method.  In  this  case  Y p S T 
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becomes 


W 


?S*P 


yr 


ys 


yr-pJ 


(31) 


(3)  i"  5,  f ~T  + P. 

This  is  called  the  "pre-  and  post-windowed"  case,  since  one  must  now  assume 
that  y(t)  - 0,  for  t < S and  t > T.  This  is  also  known  as  the  "autocorrelation" 
method.  In  this  case  zeros  are  added  both  before  the  first  sample  and  after  the  last 
sample,  and  Y p ^ T takes  the  form 


W 


ys  • • • ys*p  • • • yr 


ys 


yT,p.  . ■ y T. 


(32) 


The  names  " covariance " method  and  " autocorrelation " method  are  traditional 
in  the  speech  processing  literature,  but  from  a statistical  point  of  view  such 
nomenclature  is  not  completely  justified. 

We  may  note  that  only  in  the  "pre-  and  post-windowed"  or  "autocorrelation" 
method,  Rp  is  a Toeplltz  matrix,  while  in  the  other  two  case  it  is  no  longer 
Toeplitz.  However,  even  though  Rp is  non-Toeplitz  when  defined  in  (30)  or 
(31),  it  is  the  product  of  two  Toeplit2  matrices  and  therefore  still  carries  a certain 
shift-invariance  structure.  A class  of  algorithms  are  presented  in  [FMDK]  and 
[MDKV]  for  inverting  matrices  which  are  sums  of  products  of  Toeplitz  matrices 
and  the  algorithm  as  Investigated  here  is  a special  case  of  that  class. 


2.2  The  Pre- Windowed  Ladder  Form 


In  this  section  we  present  a detailed  derivation  of  the  pre-windowed  (PW) 
ladder  form.  At  the  end  of  the  section  we  will  show  that  this  particular  form  is 
actually  a good  approximation  to  a recursive  maximum  likelihood  method  for  the 
autoregressive  model. 


2.2.1  Algorithm  Development 

For  notational  convenience,  we  let  the  observations  start  at  time  zero,  i.e.  5 - 0 , 
and  thus  from  here  on  we  simply  drop  the  S index  altogether.  Thus  in  the 
pre-windowed  case  the  covariance  matrix  for  order  p has  the  form 

Rp,T  * Yp,T  V*  (1) 


v- 


yo  • • • yP  • • • yp 


The  matrix  Rpj  defined  above  satisfies  the  following  shifting  properties  (or 


recursive  identities). 

Order-update  (down-shift): 


>i,T  m 


*2  Rp,r-i 
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Order-update  (up-shift): 


V X3 


x. 


J 


(4) 


Time-update: 


Rp,T+  1 " Rp,T  + 


yr+i 


c >7r*i yJT-v* i 3 


7-p*i 


(5) 


where  the  X’s  represent  unspecified  matrix  elements  along  the  appropriate  edges. 


1 

Define  Apj , BpT , and  C^j,  for  p - 0 , 

1 , . . 

, p by 

t 

1 * 

I 0 

yr 

0 

I 0 

>7-1 

fi- 

I Ap,t  • Bp,r  • cp,r ] - 

• 

1 . 

. 

0 

1 0 

yT-p*  i 

0 

1 Rrp,T 

yr-p  . 

(6) 


where  Apj  and  &pj  are  respectively  the  forward  and  backward  predictors  of  the 
form 


ATpj  - 

^ ■ ^p,T^  * 

Vw  i 

b\,t  - 

[ vw  • • • • 

' Bp,rnl  • 'm  1 

and  C y is  an  auxiliary  vector  which  can  also  be  expressed  by 

c ~ - 

p.7  " 

RpJl  Y[T-T-p]  • 

and 

(7) 

(8) 


(9a) 


CP,T  ‘ ^[T-.T-p]  RP,f1- 


(9b) 


1 
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We  define  tpf,  the  forward  prediction  errors  or  innovations,  and  rpT,  the 
backward  prediction  errors  by 

r*p,rl  \ A\tT  1 \ yr  1 


P,T  J ?T-1 


yr-pA 


We  define  an  auxiliary  quantity  “i  pj  by 
yP,T  m ^P,T  Y[T:T-p]  ■ 


(11a) 


From  the  definition  of  CpT  given  in  (6),  pj  can  be  interpreted  as  the  weighted 
energy  of  the  observations  { yT_p  , . . . , yT },  which  can  also  be  expressed  as 

yP)T  ■ bJT  • • • • yJr-p  'j  R~lp,r  yr 

.yT-P\  • <llb> 

It  also  has  an  interpretation  as  a likelihood  variable  which  we  will  discuss  furthur 
in  a later  part  of  this  section. 
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2.2.2  Order  Update  Recursions 


Suppose  we  already  have  the  predictors  ApT  and  Bpj,  and  want  to  increase  the 
predictors  order  to  ^+1.  We  therefore  would  like  A y and  Rp*\,T  t0  satisfy  the 


normal  equation 


Rp+'  t [ Ap+i,r  • Bp+i,r  3 


and  we  start  by  using  relation  ( 4 ) : 


J?  . »*i  A I. 

P*l,”  p,T 


x xx 


where 


Ap*1,T  “ C last  ^lock-row  of  RpAiT  ] Ap  T 


2 yt.p. i <Tp,^^ 
»-  0 


Here,  we  can  relate  &p+\tT  t0  t*1®  partial  correlations  discussed  in  the  previous 


section. 


X 


X 


0 


Similarly,  using  the  relation  ( 3 ) : 
0 


xpA,T 


L Bp,t-  1 


x 

X 


X R 


P,T- 1 


L Bp,t-  1 


1 P*\,T 
0 

0 

0 

0 

lRTp,T-\  J 


where 


Tp*1,T  " Ifint  block-row  of  Rp.ir  ] 


2 * • 

i-l 


P.T-1 


We  can  show  that  Ap.ij  = by  noting  that  Rpt i r is  symmetric,  and 


a\,t  0 

Ap,T  0 

f 

. 0 BTp.r-i. 

. 0 Bpj-lj 

a\,t  0 

‘ *%,r 

VP*l,T 

0 

0 

. 0 BTp,r-L 

0 

0 

0 

0 

0 

0 

> AP*1J 

R'p,T-l  - 

‘ Rtp,T  Tp*1,T 

, Vi.r  *Vr-i  . 

By  symmetry,  we  establish  the  identity 


Thus  postmultiplying  ( 15  ) by  R'rpj.y  &P+\,T » where  R~r  is  the  inverse  of  Rr,  and 
then  subtracting  the  result  from  the  right  hand  side  of  ( 13  ),  we  have  the 
following  order  update  recursions  for  Apj  and  ^ pj  • 


Ap*\,T  * 

AP,r 

- 

0 

. 0 . 

. V- 1 . 

R‘rp,r- 1 AP*l,r 


(19) 


R 1 


P*1,T 


P,T 


- A1 


p*i,T  K p,T- 1 *vi,r  • 


A similar  set  of  order  update  recursions  for  and  Rr pj  are  obtained  as 


BP*i,r  ■ 

0 

- 

i 

t 

i 

L • J 

*_<P,r  AVl ,T 


(20) 


(21) 


Rr 


p*l,T 


p,T-l 


- A 


i»*i,r  AVi,r  • 


(22) 


To  obtain  the  order  update  recursions  for  , we  first  observe  that  the  last 
block  row  of  R~lp  T is  equal  to  R ~rp  y BJPif-  (This  can  be  obtained  from  the  normal 
equation  for  B r.)  Thus  from  the  definition  of  Cpj  , we  can  obtain  the  last 
block- row  of  Cp  j as 

last  block  row  of  Cp  T - T:T-p ] “ ^~T p,T  rp,T  • 

Using  the  relation  ( 4 ) , we  have 


o' 

m 

0 

P,T  * 

X XX 


>r-i 

>T-p*l 

iCp,T 


CP,T 

o 

(24) 


where 
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6Cp,T  - [ last  block  row  °fRP*l,T  ] C CP,T  • 0 ] T 


Here  we  want  to  show  that 


6Cp,T  " yT-p-\\[T-p:T]  • 

Partitioning  as  given  in  (4),  we  write  the  Normal  Equation  for  Bp+\tT  35 


follows 


V x 


X1  z 


where 


■ t s'V S<V  )T 

i.e.f  B*  j y ls  the  block  vector  formed  by  the  first  p elements  of  Bp+\j- 
Thus  we  can  rewrite  (26)  into 


Rp,T  B*pA,T  + x 


xT  B* 


P*1|T  + 


From  the  first  equation  above  we  have 


Z - Rr 


xT  ■ “ B*P*\,T  Rp,T 


and  recalling  from  (9b)  that 


Cp,T  " RApJ  Y[T:T-p*l] 


equation  (25)  becomes 

iCP,T  m xT  CP,T 

m~  B*p+\,T  Y[T:T-p*\] 

A 

" yT-p-l\[T-p:T]  • 

From  the  observation  from  ( 23  ) that  the  last  block-row  of  Cp+\tT  is  equal  to 
R~'p*\J  rp*\,T  and  3150  that  the  l*3'  block-row  of  £p+1  r is  I,  we  can  obtain  the 
order  update  recursion  for  C T 

CP+1,T  " [ CP,T  1 + Bp*l,TR'rp*l,Trp*l,T  ■ 


2.2.7 


Also  along  the  same  lines  an  alternate  update  for  CpT  is  obtained  as 
CP*1,T  * f 0 1 + Ap+1,T  *"%♦!, T ip*\,T  • 


'p,T-  1 J 


(28) 


Thus  equations  ( 19  ) - ( 22  ) , ( 26  ),  ( 28  ) give  a set  of  order  update  recursions 
for  Apj,  Bpj  and  CpT  . These  recursions  for  Ap+\j  and  ^p*l,T  are  similar  to  the 
multichannel  version  of  the  Levinson  algorithm  [WR],  [MVLK],  and  [Rob]. 
However,  the  recursions  for  Cp+ j j.  are  new  results. 


2.2.3  Time  Update  Recursions 
Using  the  relation  ( 5 ),  we  obtain 


RP,T*1  Ap,T  “ 


' R*pJ 

+ 

?7M 

0 

yr 

0 

yr-p 

0 

.yT-p*K 

eTp>7<T+l) 


(29) 


We  then  apply  relation  ( 4 ) to  [ 0 , ]T  so  that  after  post-multiplying  the 

result  by  «Tp  r(T+l) , we  can  force  the  right  hand  side  of  ( 29 ) to  satisfy  the  normal 
equation  ( 12  ).  After  some  algebra,  the  time  update  recursion  for  Ap  T is  obtained 
as  follows 


Ap,T*\  " Ap,T 


'p-\,T 


(30) 


Furthermore,  we  premtiltiply  ( 29 ) by  [ , . . . , >T 3 and  obtain 

*Tpt7M  ■ <TP,^r+1)  - yp-hT  <Tp,r(7'+1) 

and  recalling  that  “f  p.\j  is  a scalar  process,  we  rearrange  terms  to  get 

*p,T*l  “ 'p.T*7'*1)  < 1 " yP-l,T  ) (31) 


i. 
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Thus  the  time  update  recursion  for  A* T is  obtained  as  follows 


P,T*\ 


R(  (p,T*  1 f p,T*  1 

",r  1 - Vi,r 


(32) 


A similar  set  of  time  update  recursions  for  BpT  and  Rr p T are  obtained  as 

*.T 


B 


P,T*\ 


B 


P,T 


0 


1 r P.^1 

[~yp-\,T*l 


K P,T*  1 

and  for  ^p.\j  end  y pT  via 
CP,T*  1 " 


♦ :^-rl-r  *z>1 ■ 


VU 


7P-1)7’*1 
+ Ap,TA  *‘%,7V1  (p,T+\ 


(33) 

(34) 


Vl,r  " yp-l,T  + ‘Vr*!  R~P,T*  1 ‘p.TM 


P,T 


+ ( 


Tp,r* l R'(P,TA  (P,T*1  - r\,T  R~rp,T  rp,T  ■ 


(35) 


(36) 


Equations  ( 19  ) — < 22  ),  ( 26  ) - ( 28  ),  ( 30  ),  and  ( 32  ) - ( 34  ) form  a complete 
set  of  order  and  time  update  recursions  for  Ap  T,  Bp  T and  Cp  T. 

By  using  the  same  techniques,  a time  update  recursion  for  &p*\t  > which  will 
be  useful  in  the  ladder  form  implementation,  is  obtained  as  follows 


Ap*1,T*1 


+ rP,T  <Tp,7M 


p*l,T  i _ y 


P-1,  T 


(37) 


It  is  clear  now  that  the  time  update  of  &p+\j  is  in  fact  a time-average  of  the 
cross-correlations  between  rpT  and  except  for  the  special  gain  factor 

1 . The  significance  of  this  gain  factor  is  explained  next. 


1 - y 


P-1,T 
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2.2.4  Likelihood  Variable 
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I 

i 

* 
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j 

!t 

| 

I 

I 
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I 

it 


i 

I 


! 
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In  this  section  we  establish  the  significance  of  the  variable  ypT  as  a likelihood 
variable.  Consider  the  Gaussian  case  where  the  Joint  distribution  for 


{>7-  >7M >T-P  J ^ given  by 

p(yr yT.p)  - |27r2^p|-i/2  up{-\  y'lT-.T-plR^yiT-.T-pl}.  (38) 


It  can  be  shown  that  |i?^|  is  related  to  { |A*;|  , i - 0, 
related  to  { , l - 1 p } (see  [MG],  for  example)  by 

1^‘pi. 


which  in  turn  are 


(39) 


l*‘Ml  - l^jl  ( 1 - I*,-!2  ) • 


(40) 


Therefore  the  logarithm  of  (38),  becomes  a log-likelihood  function 

11  = ln|i?p|  + IMlVip 

■ £ in  |R*il  + yp 

i- 0 

* ln|Rol  + £ ln(  1-IKfl2)  + . (41) 

i-1 

We  can  indentify  the  variable  ypj  obtained  from  our  exact  least -squares 

recursions  as  the  f p appearing  in  the  log-likelihood  function.  Thus  the  y factor 

acts  as  a good  detector  for  non-Gaussian  components  in  the  observations.  Our 

simulation  results  indeed  demonstrated  that  “t p j would  take  high  values  (close  to 

1)  at  non-Gaussian  components.  It  therefore  also  acts  as  an  optimal  gain  factor  in 

that  the  gain  can  adjust  the  gains  Immediately  when  non-Gaussian 

1 • yp,T 

components  are  present  in  the  observations.  Simulation  results  are  shown  in  a later 
section  of  this  report. 


j 

I 


I 

■ 


2.2.5  Exact  Least-Squares  Ladder  Reeurrsions 

Premultiplying  ( 30  ).  ( 33  ) . and  ( 26  ) by  [ yJT ] we  obtain  the 

following  order  update  recursions  for  ipj  , rpT  and  ypj  s 

‘p*l,r  * *P,T  ~ Kfp+\,T  rp,T-l  (42) 

rp*l,T  " rP,T- 1 - Kip>\f%pJ  (43) 

*Vl,T  - yp,T  + rTp*l,r  R'rp*l,T  rP*\,T  * (44) 

where  ^p*\j  and  Krp.\j  are  the  reflection  or  PARCOR  coefficients  given  by 

k(p*i,t  - Ap*i,r  R'(p,T  (45) 

*rp*i,r  - Vi/r  R'rP,T-i  {46) 


A ptl.TVl 


a rP,T  (\,r.i 

A'*''r  * TTr^T 


The  initial  conditions  are  given  by 

<o,r  “ ro,T  * yr  '•  Y-1  ,T 

T 

RiQT  * r'd ,t  * 2 y? 

r»0 

■ *V-i  + yT  yJr  • 

for  piT  : 


‘p,r  " ‘r,r : 

rP,T  m 

rT,T  > yp,T  " yT,T  '• 

RiP,T  ■ R*T,T ' 

R\T  " 

rTt,t  • 

A P*l,T 

■ 

0; 

ap*i,p»i 

m 

yo  * p*l,p*l4 

(48) 

The  recursions  ( 45  ) - ( 47  ) compute  the  sample  cross-covariance  of  the 
forward  and  backward  innovations,  using  the  optimal  weighting  l / ( 1 - T . , . ), 
in  contrast  to  other  suboptimal  schemes  [SV], 

As  the  dual  to  the  stochastic  forms  in  [IS],  [Wak],  [Mo],  [SKM],  equations  (42)- 
( 47  ) are  a complete  set  of  order  and  time  update  recursions  to  obtain  the  exact 
least-squares  ladder  form  predictor,  which  is  shown  in  Figure  1. 
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2.2.6  Invertibility  of  Rpj 

Here  we  establish  a criterion  to  guarantee  that  Rpj  is  invertible.  From  relation 
( 5 ) we  have 

R rrt  m R 


*Plr  " 'Vr-i  + yT  yT 

yT  m [ yTr  , , yT_  T ] . 


(49) 


So,  using  the  well-known  matrix  inversion  lemma,  we  have 

RAp,T  “ r'1p,t-  1 “ R~lp,T-l  yr  [I  + yTR'lp,T- 1 yT  l'1  yT  r~1p,t-  1(S0) 
pre-  and  post- multiplying  the  above  by  yTJ  and  yT , we  have 

aP,T 


’p,T 


■ cc 


P,T 


«ZP,T 


1 + a 


P,T 


1 + a 


P,T 


where 


or 


*P,T 


yT  R~^p,T-l  yT  * 


o < y 


P,T 


(51) 


(52) 


(53) 


1 + °P,T 

Thus  when  Tp.i  r ■ 1 , recursion  will  stop  indicating  that  Rp  T is  not  invertible,  or 
equivalently  that  the  columns  of  Ypj  are  linearly  dependent.  However, in  the 
scalar  case  Rp  j > 0 if  >0"®  and  ^P,T  *s  always  invertible.  If  m > 1, 
( m - dim{  yt ) ) we  require  T i p + m.  These  singularities  can  be  avoided  by 
including  a priori  estimates  of  the  covariance  Rp,  or  equivalently  including  a 
weighted  norm  of  the  predictor  Ap  in  the  error  criterion  Several  such 

modifications  have  been  proven  useful  in  actual  implementations.  See  also  the  use 
of  the  special  quantity  7 in  our  pitch  detection  algorithm  on  Section  2.7  and  5.4. 
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,,  I 


I 


I '' 

[;! 

! 


2.2.7  Maximum  Likelihood  Estimate 

In  this  section  we  show  the  close  connection  between  the  pre-windowed 
method  and  recursive  maximum  likelihood  method  for  autoregressive  models. 

Suppose  we  have  a p-th  order  stationary  Gaussian  autoregressive  process  (for 
simplicity  we  consider  the  scalar  case  here) 

h + APa)  yt-\  + • • • + Vp)  ?t-P  m (t  > 

(54) 

where  { it  } is  an  independent  identically  distributed  zero-mean  Gaussian  random 

process  with  variances  <T2.  Given  the  observations  yT  - [ yT  ]T,  from 

this  process,  the  likelihood  function  is  given  as  [BJ] 


A - (2ir)_<IM>/2  | Rt  I*1/2  exp  { - 


y\  rt 


} . 


(55) 


with 


Rp  m E yTyTy. 


(56) 


Conditioning  on  { y0 yp_^ } , we  express  A as 

T 

A - (2 rff2))AT*m  I MP  l1/2  «/>{  - y ( 2 «(2  + y[ o.  P-  1)T  Mp  y[0,  n-1])},  (57) 

t-p 

where 


( mp~1  \j  - E ( yt*i yt > i 0-2  • 0 *i.j,*P- 

Collecting  all  the  terms  in  observations  into  a matrix  we  have 

A - (27r<r2)-(r4l)/2  | Mp  I1/2  exp{  - y ( apT  ap  ) } 


(58) 


(59) 


1 1.  V15-  V2) Vp)  ]T 


Then  it  can  be  shown  that  Sp  is  related  to  the  pre-windowed  covariance  matrix 
as  defined  in  (4)  as  follows 
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Moreover  the  maximum  likelihood  estimates  for  ap  is  obtained  by  setting  the 
i derivative  of  A with  respect  to  ap  to  zero,  and  neglecting  the  data  record  length 

i Independent  terms  (justifications  of  such  approximation  can  be  found  in  [BJ]),  we 

have  the  following  normal  equation  for  such  estimates 

2p  Ap  - [ fp\  0 Of  (62) 

y» , v" )T  <s3> 

r‘f  ■ A/  /lp  (64) 

aj 

We  can  see  now  that  the  Pre-windowed  method  is  a good  approximation  to 
recursive  maximum  likelihood  (RML)  estimates,  both  Lp  and  | M p | are 
independent  of  data  record  length,  and  asymptotic  properties  of  the  maximum 
likelihood  estimates  are  preserved,  when  T » p. 


I 


i 


2.2.8  Summary  of  Algorithm 
initialization 


1 *0,0 

«- 

r0,0 

«-  0; 

i 

*ro,o 

«-  >0  yo 

for  i <-  1 upto 

pmax 

do  Afi 

- 0; 

main  loop 


for  r 4-  I 

upto  Nsamples 

do 

1 

begin 

*o ,r  *•  ro,r  ** 

: 

Y-i,r-i  *■  0 : 

! 

i 

*■  *ro,r  *■ 

**0,7-1 

+ yT  yT\ 

q 

1 

Orderupdate-, 

end; 

procedure  Orderupdate-, 

for  p «-  0 upto  ( pmax  min  T ) do 

] begin  AP*l,T  <-AP*l,7M  + rP,T-liP,T  1 ^ 1 ~ y p-l.T-l^  • 

yP,T  yP-\,T  + rp,TrP,Tl  Rrp,T‘ 

KtP*i,T  *■  Ap*l,r  1 R*p,T  • 

K'p*\,T  *■  Ap*l,T  1 Rrp,T- 1 ; 

Vi,r  *■  Jp,r  " *Vi,r  rP,T- 1 : 

rP*l,T  *■  rp,T-\  - Kip*\,T  iP,T  > 

if  T s />max  then  begin 

Rip*l,T  *•  Rtp,T  ~ ^rp*\,T  Ap+1,T  * 

*■  RTPj-\  ~ k*p*i,t  Ap*i,r : 

end 

else  begin 

RiP* i,r  ^^p-i.r-i  + *p,t  *p,r 1 ( 1 ~ Vu-i > : 

RrP*\j  *~RT p*\,T-i  + rp,r  rP,T 1 < 1 " Vi.r-i  * : 

end; 

end; 
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2.3  Tracking  Time- Varying  Parameters 


So  far  we  have  assumed  that  our  observations  { y(t),  0 < t <T  },  are  generated  from 
a constant  parameter  model.  When  the  parameters  are  changing  with  time,  the 
general  algorithms  must  be  modified  to  track  these  changes.  Using  time  update 

/ 

recursions  of  the  algorithms,  we  apply  a simple  approach  to  the  tracking  problem 
by  including  an  exponential  weighting  factor,  ui,  or  the  so-called  fading  memory 
factor,  into  the  error  criteria. 

I 

2.3. 1 Exponentially  Weighted  Algorithms 

We  define  the  squared  error  criterion  for  the  pre- windowed  case  ( s-0,  f-T  ) as 

T 

ip,T  ■ 2 »7'-'  «V>  • <»> 

t-0 

where  ui  is  a constant  si,  ( e.g.  w - 0.99  ) so  that  the  past  prediction  errors,  being 
weighted  with  w,  will  have  smaller  influence  on  the  estimates.  Other  weighting 
schemes  such  as  using  time-varying  weights,  w(t),  are  more  complicated  (see 
Ljung[Lju],  for  example).  For  simplicity,  we  only  consider  the  case  of  constant  w. 

We  introduce  a simple  procedure  to  obtain  the  ladder  form  for  this  exponentially 
weighted  case  by  first  considering  the  simple  time-weighted  case.  The  Normal 
Equation  is  weighted  with  respect  to  the  length  of  observations,  i.e.  the  time 
index. 


4 

R*p,T  Ap,T  " [ R**p,T  .0,0... 

..Of, 

(2) 

where 

R*p,T  m 7TT  Rp,T 

9 

(3) 

** p,t  " 7TT 

• 

(4) 

This  time-weighted  Normal  Equation  ( 2 ) retains  the  same  form  as  the 
non-welghted  one,  and  the  least-squares  predictor  for  the  two  cases  must  be 
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w 


identical. 


Thus  both  the  forward  and  backward  predictors  would  remain  the  same  while 
only  the  auxiliary  quantity  Cp  T needs  to  be  modified  to 

C*p,T  * ( T + 1 ) CP,T  • ^ 

and  the  defining  relation  for  them  also  retains  the  same  form 

'*‘V  0 I it 

0 I 0 I Jj-\ 

R\t  f Ap,T  > Bp,T  ' ^pj  ] ■ I • 1 • 

o I o I yr-p* i 

0 ^r%,r  l yr-p 

with  p j*  defined  by  ^ p T' 

Similarly,  the  forward  innovations,  tpj,  and  backward  residuals,  rp  ^,  remain 
unchanged,  while  **  Ptj,  the  auxiliary  quantity,  is  modified  to 

- <r+'>V-  <7-' 

The  defining  relations  for  them  again  retain  the  same  form 

yr 
Jr-i 

?r-p-i 
yr-p 

We  next  consider  the  time-update  recursion  identities  for  the  matrix  R*pj 
which  is  given  by 

y T 

yT-p 


Thus  if  T of  the  weighting  factor  is  set  to  any  constant  r0,  we  can  interpret  the 


can  also  be  interpreted  as  the  a priori  time  constant  for  the  underlying 
time- varying  model. 

Recursion  ( 9 ) can  be  rewritten  into 


- *V-1 + ( 


UT-r 


£ ?V  • • • yJT-p  1 ~ ^*p.T- 1 } / ( 7*  + 1 ) . 


(10) 


1 


and  we  can  identify  as  some  time-varying  parameter,  \(T),  that  is  related  to  a 
time-varying  weighting  factor,  a i(T).  Indeed,  it  can  be  easily  shown  that  w{t)  and 
\{t)  are  related  by 

: «°>  ' ' ■ »» 

The  order-update  recursion  for  R*pj  retains  the  same  form  as  the  non-weighted 
case  and  is  given  by 


R* 


P,T 


R * 


P-1,T  x 

XXX 


(12) 


while  the  time-shifted  order  update  is  modified  to 


R* 


P,T 


XX  X 

x JLr*.t. 

7TT  p-U-i 


(13) 
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2.3.2  Ladder  Recursions 


Having  set  up  the  time,  order,  and  time-shifted  order  update  recursions  for 
in  ( 10  ),  i 12  ) and  ( 13  ),  various  recursions  for  other  quantities  can  be 
obtained  in  the  same  fashion  as  described  in  Task  Report  I,  therefore  we  will  only 
give  the  important  results  and  also  drop  the  superscript  from  here  on  so  that  all 
quantities  are  defined  in  the  time- weighted  context. 

Again  we  obtain  A/}+1)7',  r^j  T by  the  following 

R.  T * 


*P+1,7 

Ap,t 

- 

0 

pj 
X XX 


AP,T 

1 

o 

' P,r 

0 

0 

0 


where 


Ap*1,T  ’ C last  Mock-row  of  RpA  T ] 


2 ><-p- 1 <TPyr) 

«=  0 


P,T 

0 


and  similarly  for  Tp.itT 

Vp*l,T  m l first  block-row  of  Rp^T  ] 


L Bp,T-  1 


2 h rTp,r-i(r"I)  • 


with 


(14) 


(15) 


(16) 


“p+I.T  m 1 P*i,T 

as  given  by  the  so-called  Burg-type  lemma. 

The  forward  and  backward  reflection  or  PARCOR  coefficients  are  given  by 


(17) 


K « 


P+1.7 


Kr 


P+1.7 


Ap*l,7 

aTp+i,7  r~'p,t-i 


(18) 


(19) 


Notice  the  { -y- } factor  in  the  Krp+\j  due  t0  tlle  sWffed  time  index  of  Rrp 
The  ladder  recursions  are  given  by 


■ 

ipJ 

- ^fp*l,T  rp,T~l 

(20) 

V 

i,r 

■ 

rP,T- 1 

- Kip*\,Tip,T 

(21) 

Vw 

■ 

yp,T  * 

rV 1,7  R~T p*\,T  rp+ 1 

,T  ‘ 

(22) 

R ‘ 

PJ 

- AVi,r  R'rP,T-i  ( } 

*PA,T  ■ 

(23) 

RrP*hT  - 

Rr 

P,T- 1 

;_L 

{r+i 

} ' AP*W  Rmtp,T 

AVl ,T  ■ 

(24) 

The  time  update  for  Ap  T requires  some  similar  algebraic  manipulations  and  is 
given  by 


Vi,r*i  " Vi ,t  + c i - V1(// a+i)  " Vu  ]{  • (25) 


Initial  Conditions 

The  initial  conditions  remain  the  same  as  the  unweighted  case,  and  a time 
constant  is  needed  to  set  up  the  desired  weighting  factor. 


for  piT : 


<0,7 

’ r0,7 

- yT  ; T.ir  - 

r*o,t 

■ *ro,7-i 

+ C ~ r'o,t-i  ^ ( 

*P,T 

" <7,7  > 

rp,T  " rT,T  • yp,T 

«\,T 

m R^ipjt', 

Rfp,T  " R't,T  ; 

■ 0 ; 

A/»*i,/>*i 

yo  * p*i,p*\’ 

'T,T  '• 


(26) 

Other  aspects  of  the  ladder  form  like  invertibility  of  Rpj  remain  unchanged  from 
the  unweighted  case. 


2.3.3  Summary  of  Algorithm 


initialization 


, *0,0 

- r0,0 

4-  0; 

*v 

*•  *ro,o 

*■  >0  *0 

for  i 4-  1 upto 

pmax  do 

i Af*i 

<-  0; 

for  7 4- 

1 upto 

N samples 

begin 

• 

H 

‘o,7  •* 

\T  *■ 

if  tracking 

■ true  then 

1 begin 

l 

ttau  4- 

7 min  tau ; 

i 

• 

Ti  4- 

1 / ttau  ; 

: 1 

Tli  4- 

1 / ( ttau  + 1) 

r » 

7071  4- 

ttau  1 {ttau  + 

SI 

7170  4- 

1/7071 

end 

else 

. 

begin 

Ti  4- 

1/7; 

• 

71;  4- 

1 / (7  + 1)  ; 

;» 

7071  4- 

7 / (7  ♦ 1)  ; 

i 

7170  4- 

1/7071  ; 

end; 

Y-l,7-l  ♦* 

**0,7  *" 

0 ; 

r\t 

main  loop 
do 


>7 


4-  R 1 


0,7-1 


+ Tli  [yT  yT  - /?*q,7-1  ^ 


Ordeiil  pdatr, 


end; 


2.3.6 


' 


procedure  OrdtrV  pdatt, 

for  p *■  0 upto  ( pmax  min  T ) do 
begin 


AP*i,r 

4- 

Ap-i.r-i 

+ Tli  trp,T- 1 ( 1 " 

V 

4- 

Vi.r 

+ rp,r  rp,T 1 Rrp,T 

x*p*i,r 

4- 

AP*i,r 1 RtP>T  > 

*w 

4- 

TITO 

Ap*\,T  1 Rfp,T- 1 : 

*P+l,T 

4- 

*PJ 

- *Vi.r  rp,r-i  5 

rp+l,T 

4- 

rP,T- 1 

“ Ktp*l,T  tP,Ti 

pmax  then 

begin 

RipJ 

~ Krp*l,T  ApA,T> 

Rfp*\,T  T0Tl  Rfp,T- 1 “ Kip*l,T  Ap*l,T  ’ 
end 

else  begin 

R(p*l,T  *-Rtp*l,T-\ 

R'p+1,T  '-Rrp*l,T-\ 
end; 


+ rit'  *pj  (P,T 1 < 1 - <rt>V i,r-l  ) '• 

+ Tli  rpI<  rpT  I ( 1 - ( TiYt ) ; 


end; 


end  of  algorithm 


AP*l,r-l]  : 1 
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2.4  The  Covariance  Ladder  Form 

As  mentioned  earlier,  the  covariance  ladder  form  is  even  better  suited  for  speech 
modeling  than  the  pre-windowing  form,  since  it  does  not  require  any  windowing 
of  the  data.  This  is  of  importance  when  the  analysis  Involves  short  data  segments  as 
is  the  case  in  speech  modeling.  Strictly  speaking,  the  covariance  or  non-windowed 
form  uses  a rectangular  window  on  the  data,  i.e.  it  uses  only  "the  available  data 
set".  The  weighted  pre-windowing  form  uses  an  exponential  window  (on  the  error 
sequence)  that  acts  like  a sliding  window  of  a covariance  type.  Therefore  the 
behavior  of  the  weighted  pre-windowed  form  is  expected  to  be  similar  to  a fixed 
sliding  window  covariance  form  (a  slightly  more  complex  covariance  type  form). 
To  derive  the  covariance  form  algorithms  we  use  the  notation  introduced  in  Section 
2.1  . 

2.4.1  Basic  Definitions 

The  covariance  matrix,  ^Pls,T>  satisfies  the  following  recursive  identities: 


Rp£,T  m *W-1  + 


yr 


J'T-I 


t?Tr ?V-p] 


(l) 


P&T  m Rp£+1,T  * 


W 


W 


ys 


*p 


L ys 


t/s-p >Ts] 


XXX 

p-iAT-i 


* R 


P-l^l  ,T  x 

XX  X 


(2) 


(3) 


(4) 


where  the  x's  represent  unspecified  elements  along  the  appropriate  edges. 
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Define  Ap^T  , B^j  , Cp(S  T and  DptST  by 


Rp£,T  [Ap£,T'  Bp&T'  Cp&T‘  Dp&T  3 “ 


1 

0 

l?T 

o I 

0 

l3»r-i 

1 . 

1 Js+p-l 
i . 

c 1 

0 

1 * 1 • 

1 yr-p*  il  >s*i 

0 1 

Rfp,S,T 

1 yr-p 

1^5  J 

£5) 


where  ApST  and  Bpts,T  are  respectively  the  forward  and  backward  predictors, 
with  ap,S,T  8114  Bp,S,T  *y 

A\#,T  " [Im  - Ap£,TW BP£,T{P)  3 • 

• 'V V/1'  ■«  1 <6> 

and  Cp  5 j.  and  <Dps  f are  auxiliary  quantities. 

We  define  tpisj>  the  forward  prediction  errors  or  innovations,  and  rpSr,  the 
backward  prediction  residuals,  and  auxiliary  scalar  quantities  gPt$j  and  ^p,s,T 

yr 

yr-\ 


*P&T 

a\&t 

rp£,T 

m 

b\s,t 

*P&T 

C'p&T 

*p&t. 

yT-P*i 

yT.P 

and  one  more  auxiliary  scalar  quantity  fpT  by 

fp£,T  " D\&T  y[S*P  :S  ] ■ 


(7) 


(8) 


The  quantities  • 8p£,T  and  ^p&T  Ul>  related 


r , i 

Sp&T  rtPls,r 

ir  y\-\  • 

i 

a 

*V\ 

fpfi,T. 

?Ts+/>  y1  s*p-\  • 

• >Ts. 

m 

ir  /r-i  • 

■ y\-p 

y s*p  y s*p- 1 ■ 

■ is 

?S*p 


yr-p+i  ys* i 
. yr-p  ys 

1 °P,S,T  • Dp£,T 


i.  0 


] 

(9) 


We  now  let  the  basic  observation  record  span  0 - S s t & T,  and  drop  subscript  0 


when  no  confusion  is  created. 


I 


2.4.2  Order  Update  Recursion* 

We  would  like  and  Bp+\j t0  satisfy  the  normal  equation 


c Ap*i,t  • Bp*i,t  ] 


p*l,T  I 0 
0 | 0 

0 l *W  J 


(10) 


and  we  start  by  using  relation  ( 4 ) : 


Rp*l,T 

ApXT 

■ 

; 

' 

0 

W 


XXX 


apxt 

■ 

0 

. 

0 

Ap*i,r . 

(11) 


where 


AP*i,r  * 1 last  Uock-nu>  °f  Rp+i,t  ] 


pXT 

0 


(12) 


Similarly,  using  the  relation  ( 3 ) : 

xxx 


Rp*i,t 

0 

- 

. Bp,r-1. 

x R 


p,T-l 


0 

m 

Tp*UT 

0 

. 

. Bp,r-i . 

1 - 

(13) 


where 


rp^l,r  * [first  block-row  of  RpA  T ] 


L Bp,t-  1 J • 


(14) 


Applying  the  so-called  Burg-type  lemma. 


0 

Rp*\,T 

ApXT  0 

. ° ST^-1. 

. 0 Bp?-K 

A\XT  0 


0 & 


R*pXT  Tp+\,T 


Ap*i,r  *rp,r-i 
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r\t  Tp*\,t 
. ap*i,t  *rpj-i  . • 

note  that  these  expressions  are  all  symmetric  matrices.  Therefore  we  get  the 
important  identity 


Ap*\,T  “ rVl,T  <15) 

Thus  postmultiply  ( 13  ) by  R'rpj.\  Ap*\,T  • where  R'r  is  the  inverse  of  Rr,  and 
then  subtract  the  result  from  ( 1 1 ),  to  obtain  the  following  order  update  recursions 


for  Ap  T and  R(pT 


Vu  ■ 

Ap,1,T 

- 

0 

0 

L J 

fp,T- 1 . 

R~'p,T-  1 AP*l,r 


(16) 


P*i,r 


P.1.T 


- A1 


P+1|T  R~r p,T~\  Ap*\,T  • 


(17) 


A similar  set  of  order  update  recursions  for  Bpj  and  Rr pj is  obtained  as 


Bp*i,t  - 

0 

- 

ApA>T 

.BP,T- 1 . 

0 

r'*p,i,t  a\*i,T 


(18) 


Rr 


P*l,T 


P,r-1 


- Ap*i,r  R~* p,i,t  AVi ,t 


(19) 


To  obtain  the  order  update  recursions  for  CpT  , we  first  observe  that  the  last 
block  row  of  R~lpj  is  equal  to  R~rpj  BJ  p,T-  Thus  from  tlxe  definition  of  Cp  T , we 
can  obtain  the  last  block-row  of  CpT  l.e. 


n-p  J 


(20) 


lnplles 


[ last  block-row  of  Cp  F ] - R~rpj  rpj 


(21) 


2.4.5 


Using  the  relation  ( 4 ) , we  have 


RP*\,T 

CpA,T 

• 

RpXT  x 

cp,i,t 

X 

0 

■ a 

X XX 

c 

n 

?T- 1- 

n-P 


(22) 


and  from  the  observation  ( 21  ) that  the  last  block-row  of  is  e<Iual  to 

R~rp*\,T  rp*\,T  and  8180  that  the  last  bloclt"row  of  Bp*\,T  is  ^>n>  W8  can  ot,tain  the 
order  update  recursion  for  Cpj  as 


'p*1,T 


W 

0 


+ Bp*\,T  R~fp*\,T  rp+ 


(23) 


Similarly,  (he  order  update  recursion  for  Dpj  is  obtained 


>i,r  " 


LDP,r-iJ  • 


+ AP*\,T  *'%♦!, T Vl,7^+ 1) 


(24) 


Thus  equations  ( 16  ) - ( 19  ) , ( 23  ) and  ( 24  ) give  a set  of  order  update 
recursions  for  Bp  j,  CpT  and  Dpj.  These  recursions  are  very  similar  to  the 
multichannel  version  of  the  Levinson  algorithm  ([WR],  [MVLK],  and  [Rob]). 


2.4.3  Time  Update  Recursions 


Using  the  relation  ( R 1 ),  we  obtain 

R „ , A 


m 

Rtp,T 

+ 

?r*i 

0 

• 

0 

■ » 

L?7M-p. 

«vr+i) 


(25) 

We  then  apply  relation  ( 3 ) to  [ 0 , & p.\j  ]T  50  that  after  post-multiplying  the 
result  by  *t/Ji7’(T+1)  , we  can  force  the  right  hand  side  of  ( 23  ) to  satisfy  the  normal 
equation  ( 10  ).  After  some  algebra,  the  time  update  recursion  for  ApT  is  obtained 
as  follows 


P.7V1 


P,T 


lCp-l,T 


Furthermore,  we  premultiply  (25)  by  .....  /jM-p  ^ and  obtain 

*Jp,T+l  " |Tp,7^T+1>  * SP-l,T 
and  recalling  that  &p.\j ls  a scalar,  we  rearrange  terms  to  get 

*p,7M  " tp.T<7'+1)  < 1 **  8p-\,T  > 

Thus  the  time  update  recursion  for  Rip  T is  obtained  as  follows 


p,T*l 


R* 


P,T 


*p,JM  « p,T*l 

1 * *P-1,T 


(26) 


(27) 


(28) 


(29) 


A similar  set  of  time  update  recursions  for  Rpj  and  Rrpj  is  obtained  as 


P»T*1 


B 


p>i 


'p-i,i,  r*i 
o 


r P,T*  1 
l~fp- 1,1, 7M 


Rr 


p,T*\ 


Rr  + \pI21  1Lb.T*l  . 

P,T  1 - SP.\,\,T*\ 


(30) 

(31) 


Equations  ( 16  ) - ( 19  ),  ( 23  ),  ( 24  ) and  ( 26  ) - ( 31  ) form  a complete  set  of  order 
and  time  update  recursions  for  A pT , £pj,  CP,T  and  Dpj. 


2.4.4  Time-Shifted  Recursions  and  Scalar  Updates 
By  using  the  same  techniques  for  obtaining  the  order  and  time  updates,  we  can 
also  obtain  the  time-shifted  updates  of  various  quantities  and  updates  for  the  scalar 
quantities.  The  time-shifted  updates  for  R*  pj.  Rrpj<  & Pit>  ^p>T<  ^pj  411,1  ^p  T are 
given  by: 


RipA,T 

■ RIPJ 

R * t>  t 

C / + pjjp 

,7 <P'>t\,I<P)  j.i 
/P,r 

Rrp,T- 1 

" R'p,T 

1 - 

ApA,T 

" [AP? 

Dp,Ti\^ 

l-fP,T 

] ^.l.T 

Bp,T-  1 

" [BP,r 

CP,T  r\^T) 
1 ’ gP,T 

3 R~rp,TRrp,T- 1 

CP,hT  “ 

CP,T  + Dp,T  ■ 

hP,T 

1 _4,T 

°P,T- 1 " 

Dp,T  + CP,T  ■ 

hP,T 

1 _ *P,T 

The  scalar  updates  torfp  T,  gpT  and  hpT  are  given  by 

fP*\,T  “ fP,T-l  + t\*U1^P*^R‘tp*l,Tip*l,1<P+^ 

m fP,T  + ! 3^“  + iJp+l,1<P+l'>R'tp+l,Tip.l,T<P+l) 

tp*l,T  " Sp,l,T  + R~rp+l,T  rp*l,T^7"> 

• gp,T  + 1^“  + rVi,r(7')  R'rpA,T  rp+\tlV"> 

hpA,T  m hp,\,T  * ^ pAjW'*  R~f p*\,Trp+\,l(P+{y> 

" hp,T- 1 + ^p^P*1^  R'(p*l,Tip*l,T^I">  ■ 


A time  update  recursion  for  &p,\j  • which  will  be  useful  in  the  ladder  form 
Implementation,  can  be  verified  to  be  given  by 


^p*l,T*l 


Vi,r  + 


rp,T 

1 " *P-IA,T 


(32) 
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2.4.6  Ladder  Type  Realization 

Premultiplying  { 26  ),  ( 30  ) , and  ( 23  ) by  [ y*T ^T-p*  1 ^ we  obtain  tlie 

following  order  update  recursions  for  ipj  and  r y. 


W”  ■ 

Wr)  - *Vi,rVr-i<7'~1) 

(33) 

wr)  ■ 

rp,T- 1*7"-1*  " K‘pA,Tip,l,^ 

(34) 

' 

{time  update) 

(35) 

- ♦ >p,w  , 

(36) 

where  K* p+i  r and  £rp,ir  3X6  the  reflection  or  PARCOR  coefficients  given  by 

Kip*hT 

aTp+i,t  *'%,  1,T 

(37) 

" AP*\,T  R~rp,T- 1 

(38) 

The  recursions  ( 33  ) - ( 38  ) compute  the  sample  cross-covariance  of  the 
forward  and  backward  innovations. 

In  contrast  to  the  approximate  schemes  ( [SV],  [MG]  ) that  have  been  presented 
to  solve  the  stochastic  ladder  forms  given  in  [IS],  [Wak],  [Mo],  equations 
( 33  ) - ( 38  ) form  a complete  set  of  order  and  time  update  recursions  to  obtain  the 
exact  least-squares  ladder  form  predictor,  which  is  shown  in  Figure  1.  . 
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Figure 


2.4.6  Initial  Conditions 


In  the  covariance  ladder  realization,  we  may  not  start  the  recursion  at  T - 0 and 
just  keep  doing  order  and  time  updates.  This  is  because  for  small  values  of  T,  Rp,o,t 
is  always  singular.  More  specifically  if  T < p(m  + 1)  + m - 1,  where  m - dim  yr  then 
Rp,0,T  certainly  be  singular,  for  then  Y q j will  be  of  low  rank.  To  properly 
initialize  the  covariance  ladder,  we  can  wait  until  enough  data  is  obtained,  say  { yQ, 

• • • ■ >ri  }>  so  that  Rpfi,Tl is  nonsiagular  for  all  orders  p to  be  considered.  Thus  if 
pmax  is  the  maximum  order  of  the  ladder,  we  have  only  to  check  the 
nonsingularity  of  ^pmaX)o,n  since  in  this  case  *>e  nonsingular  for  p s 

pmax  , T iT  1 as  can  be  easily  verified. 

In  practical  situations  (e.g.  the  scalar  case),  it  is  to  be  expected  that  T 1 is  close  to 
2 pmax.  Now  suppose  we  have  determined  7*1  such  that  ^f>ma*)o,ri  is  nonsingular. 
From  { , . . . , } we  recursively  compute  the  last  row  of  of  Rp  0 for 

p - 0,  ....  pmax  and  keep  the  first  pmax  values  { yQ ypmax  }•  Using  the 

recursions  given  above,  we  can  then  compute  Apn,  ^rp,0,7T  and 

(P,0,T1<P)  for  p - 0,  . . . , pmax,  which  constitute  the  initial  conditions  for  the 
time  update  of  these  quantities. 

The  initial  conditions  for  order  updating  the  ladder  recursion  are  given  by 


2.4.7  Invertibility  of  Rp  T 

Here  we  establish  a criterion  to  guarantee  that  Rpj>  is  invertible.  From  relation 
( R 1 ) we  have 


V ■ V-i  * yyl 

yT  ■ [ yj  . • ■ • . yj.p  3 • C4i) 

So,  using  a well  known  matrix  inversion  lemma,  we  have 


r-\,t  - r-\t- i - r'\,ta  y n * y'  rAp,t-i  y yT  r-'p,t- ■ • 

Pre-  and  post-multiplying  this  by  yT  and  y , we  get 

aZp,T 

b,T  - - 77^  • 

where 


(42) 


(43) 


v-  ■ yl  R'lp,r-i  y > o 


(44) 


or 

0 * Sp,T 

A similar  argument  shows  that 

0 < fp,T  < 


aP,T 

l+aP>T 


< 1 . 


(45) 

(46) 


Thus  we  see  that  the  invertibility  is  equivalent  to  the  condition  ( 45 ),  ( 46  ) so  that 
divisions  by  1 - g j>,  and  1 - fp  j can  be  carried  out. 
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2.4.8  Summary  of  the  Algorithm 


initialization!  ( See  Text ) 
main  loop 


for  T *■  T 1 upto  N samples  do 


1 

begin  «0,r 

r0  ,T 

*-  : 

*■ 

8-1, T-l 

- 0 ; 

\ ■ 

Ri0,T 

r\t 

*■  R*0,T-l 

+ yT  yT 

1 

j 

Rt0,T  ” 

Rr0  ,T 

*■  RiQ,T- 1 

+ yT  yT 

4 

Ordtri)  pdatr. 

end; 


Procedure  OrderU  pdatr, 

for  p «-  0 upto  pmax  do 


begin  tpjip)  «- 

iP,T- lW_ 

*p,r  Vu-i 1 < 1 ~ 8p-i,T-i>  '• 

*pXT  - 

(P,T  + 

iP^P'>  V i,r-i 1 ( 1 "/p-i.r-P  s 

Ap*\,T  <- 

AP*lT-l  + 

rP)r-i  W/(1'Vu,m  ); 

8p,i,r-r 

^PJ-I  + 

^2p,r-i 1 < 1 _/P)r-i k 

SP,T  - 

8P-\,T-l  + 

‘p7  'p?!  Rip,T’ 

hP,T  *■ 

Vi.r-i  + 

(P,T<P)  (P,T  1 R*pf 

fpj  *■ 

fp-l,T-l  + 

ip,Jl'P)  ip,T^P)  1 ^p.P 

r6p,1,T  *- 

‘p^P^p^P)  1 < 1 ~fp-\,T-0  < 

K P+l.T 

4- 

ap*i,t  1 rp,i,t  • 

k'p+i,t 

4- 

VlJ 

1 R'p,T-l  *. 

(P*1,T 

4- 

(pXT 

- *Vu  rp,r-i  •> 

rp*l,T 

♦- 

rP,T- 1 

■ *%*i,r  *p,i,r : 

R‘p-UT 

■ *Vi,r  Ap»i,r  J 

*Vi.r 

^rp.r-i 

“ Ap*i,r : 

end; 


end  of  algorithm 
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2.6  Pole-Zero  or  Autoregressive  Moving 
-Average(ARMA)  Ladder  Forms 

In  this  section  we  present  ladder  forms  for  pole-zero  or  autoregressive 
moving-average  (ARMA)  models.  Our  approach  here  Is  to  embed  the  underlying 
ARMA  model  Into  a two-channel  AR  model  resulting  in  a Joint  Innovations 
representation  of  the  process.  The  two-channel  predictor  of  this  Joint  process  may 
be  implementated  as  a two-channel  AR  ladder. 


2.6. 1 Joint  Innovations  Representation  of  the  ARMA  Process 
Given  a pole-zero  (ARMA)  model  of  the  following  form 

h * ^1>»-1  + • • • + ANh-N  " fl0  ut  * B1  Vl  + • • • + BNut-N • 
where  { yt,  0 s t s T } are  m-vector  observations,  { u, } the  input  process  which  is 
assumed  to  be  an  uncorrelated  sequence  of  m-vector  random  variables,  and  An  and  Bn, 
the  m by  m matrix  coefficients  of  the  model. 

The  model  equation  can  be  rewritten  as 


?i  + A1  J'l-l  ■*••••+  ~ B1  “i-l  “ • • • - bn  Ut-N 

and  in  matrix  notation,  we  have 

V yi  - "V  ■ V.  • 

with 


aN  " I 'm  ■ Al'  Al' 

*V  - I «.  • «!• 

y;  - 

■v  - [«/■••• 


• • • an 


3. 


. . , Bn 

W 3 • 


3. 


leading  to  the  following  augmented  equation 


a T 
aN 

k 

« 

V 

■ 

V* 

0 

V . 

.V 

. tt«  . 

where  50  is  the  first  m by  m block,  unit  vector. 


*0  “i  • 


(2) 


(3) 


(4) 


(5) 


(6) 
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This  embedded  model  can  be  interpreted  as  a 2-channel  AE  model  of  the  joint 
process  {yt,  ut }.  Here,  the  right  hand  side  of  the  augmented  equation  is  equal  to 
the  joint  innovations  of  { yt,  ut },  since 


r«l 


1 


L(*.  J - “m-ij  L “i  J • (7) 

Indeed  if  we  apply  a simple  interleaving  permutation  of  (1,  3,  5,  ...  , 
2N+1,  2,  4,  6,  . . . 2N+2  ) to  the  augmented  equation,  we  have  the  familiar  joint 
2-channel  one-step  predictor  of  the  form 

V2i  “ (t  • (8) 


where 


lm  °m  A1  ~B\ 


AN  ~B1 V 


L°m  Om. 


ut T yt-N* • J • do) 

In  the  stochastic  case,  the  problem  of  finding  the  linear  least-squares  predictor 
is  then  reduced  to  solving  the  normal  equation  of  the  following  form 


where 


^ • • • 0 ]T  , 
V“ 0 

^1T  . . . 

Ry-N.EZizt^. 


ty,.n  . #N 


• . R^n 


E «,  l,T 
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(14) 
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* (o)T 
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1 
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O 
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Q 

a 

Since  ■Ryu^  is  a block  Toeplitz  matrix  with  blocks  of  size  2 m by  2m,  the  solution 
to  this  normal  equation  can  be  obtained  via  the  Levinson  algorithm  [WR].  The 
parameters  estimates  which  are  embedded  in  the  solution  are  recursively  computed 
in  order.  (Note  the  sparseness  of  R**  and  An.) 


2.6.2  Least-Squares  Recursions  for  ARMA  Models 
In  the  deterministic  least  squares  case,  the  joint  forward  and  backward 
prediction  errors  of  order  n at  time  T is  defined  as 


n,T 


L‘V 


yr  - >rir-ir„)r-« 

A 

UT  ~ ur!7'-l_,7'-n 


- Any  Z[T:T-nl  , 


(15) 


rn,T 


r*. 


n,T 


rvj 


yT-n  - yT-n\T-n*\^,T 


Lur-n  - uT-n\T-n*\r~,Tm 


V ztrT-"1 


(16) 


The  covariance  matrix  for  the  Joint  process  is  thus  similar  to  that  given  in 
Section  II,  and  with  its  shifting  properties,  one  can  obtain  recursions  for  the 
2-channel  ladder  form  from  the  recursions  of  an  AR  case,  i.e. 

in*\,T  " (n,T  " R''n,T-l  rn,T-\  (*?) 

rn*l,T  " ri»J-l  ' Rm*n*l,T  fn,T  • <18> 


In  the  two-channel  case,  the  partial  correlation  matrices  exhibit  addition 

structures,  due  to  the  embedding  of  the  white  input  process  of  the  ARMA  model 


into  an  AR  model.  Since  all  future  inputs  are  assumed  to  be  uncorrelated  with  the 


present  and  past  outputs,  half  the  elements  j matrix  are  approximated  by 
zeros.  Indeed  the  partial  correlation  matrix  is  given  by 


^♦l,r“ 


^yn*\,T  °m 


rl 


2 wV')T 


«■«+! 


• 2 ui-n*l*yn,T<f)T  °m  J 
t-n*l 


(19) 


The  time-updates  for  these  partial  correlations  obtained  in  a similar  manner  as  in 
Section  II. 


^yn*l,T*l 
AUn* \,T*l 


< < w 


a v . r 'M  »i‘ i 

A ”-1'T  * ‘ i-w  1 

r“„TT.  <y„r*iT 

a u r n,r+l  i 

A "'•T  * ‘ > - Vi,r  ’ 


(20) 

(21) 


In  order  to  obtain  ladder  recursions  for  the  prediction  errors,  inversions  of  the 
joint  2m  by  2m  prediction  error  covariances,  and  Rrnj  are  needed.  These 

inversions  are  simplified  by  first  decomposing  them  into  upper-diagonal-lower 
form  and  then  taking  inverses.  Examples  of  ladder  forms  obtained  this  way  are 
presented  in  [Lee].  Figure  2.5. 1 shows  an  example  of  one  such  structure. 


Notice  the  feedback  structure  of  the  embedded  ARMA  ladder  form.  The  output  of 
the  forward  Innovations  section  is  fedback  as  input  to  the  backward  residual  for 
{ ut }.  Assuming  that  { u{ } has  unit  variance,  we  have 

uT  m R~ty/iNj  iyNj  ■ (22) 

Instead  of  using  one  single  feedback,  as  in  ( 22  ),  it  is  also  possible  to  obtain  uj>  by 
distributed  feedbacks.  Ladder  forms  of  such  kinds  are  still  under  investigation  and 
results  of  these  studies  will  be  available  in  later  publications. 
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2.6  Computational  Complexity 


One  of  the  attractive  features  of  the  class  of  algorithms  described  in  the  previous 
sections  is  their  computational  efficiency.  The  computational  requirements  of  the 
fast  algorithms  are  in  fact  proportional  to  the  number  of  model  parameters.  This 
compares  favorably  with  currently  used  methods  as  indicated  in  Table  I . There  are 
many  ways  of  measuring  computational  complexity;  the  measure  adopted  here  is 
the  number  of  multiplications  per  N samples  of  speech  (N  may  be  the  number  of 
samples  per  frame)  for  a p-th  order  autoregressive  model.  All  index  calculations  are 
ignored.  For  methods  that  store  the  full  covariance  matrix,  two-dimensional  array 
indexing  has  to  be  performed  which  may  be  costly  on  a machine  having  only 
integer  arithmetic.  Our  fast  algorithms  compute,  in  addition  to  the  model 
parameters  and  gains,  numbers  that  allow  us  to  efficiently  compute  the  log 
likelihood-ratio  used  for  pitch  extraction,  i.e.  decomposition  of  the  driving  process 
into  a jump  process  and  a Gaussian  process.  Many  LPC  systems  use  more 
computations  in  the  pitch  detection  section  than  for  the  actual  modeling,  this  is  not 
the  case  for  our  algorithms.  For  optimal  vector  quantization  schemes  the  number 
of  operations  is  also  much  larger  than  the  number  required  for  the  model  parameter 
computations. 


Table  1 Comparisons  of  Operation  Counts  for  Fast  Algorithms 


Estimated  number  of  operations  per  frame  of  size  N and  model  order  p, 
multiplications  denoted  by  *,  divisions  by  /,  square-roots  were  Ignored. 
The  number  of  additions  Is  equal  to  the  number  of  multiplications. 


Pre&Post-Wind.* 

Pre-Windowing 

Non-Windowing 

MEM-type* 

AR 

|(Np+p2)*  + p/  | 

N(4p*  ♦ 4/) 

N(6p*  + 6/)  | 

Np(2*  «■  2/)  | 

AR  Ladder 

|(Np+p2)*  ♦ P/  1 

Np(6*  ♦ 4/) 

Np (11*  + 10/)  | 

Np( 5*  + 2/)  | 

ARMA  Ladder 

I (Np+3p2)*  ♦ p/| 

3Np(6*  ♦ 4/) 

3Np( 11*  + 10/) | 

not  available  | 

References 

[Lev],  [Mo] 

[MLNV] 

[MVL] 

[Burg] 

Actual  program  complexity  depends  heavily  on  the  computer  language  used.  For 

example  most  of  our  algorithms  require  programs  of  a half  a page  to  a page  of 

clearly  structured  code  of  the  ALGOL  type  language  MAINSAIL.  (A  FORTRAN 
program  would  take  about  5 times  more!) 


* The  Levinson  and  the  MEM  - Maximum  Entropy  Methods  are  not  recursive  in 
time,  i.e.  they  would  have  to  be  recomputed  for  every  change  in  frame  size  or 


2.7  Implementation  Issues 


111! 

Generally  speaking,  the  implementation  of  the  exact  recursive  algorithms  is 
fairly  straight-forward,  if  at  least  some  care  is  taken  that  is  advisable  in 
implementing  any  algorithm  on  a computer.  Since  we  implicitly  solve  linear 
(normal)  equations,  the  usual  precautions  for  solving  linear  equations  apply  also  to 
our  case,  such  as  conditioning  (ratio  of  largest  to  smallest  eigenvalue),  consistency, 
etc.  I.e.  if  fewer  data  samples  than  parameters  are  available,  the  set  of  equations  is 
over-determined  and  the  covariance  matrix  or  prediction  errors  are  singular,  a 
situation  that  can  arise  during  startup  of  a (recursive)  algorithm.  Some  of  these 
properties  can  be  observed  in  the  spectrum  of  the  (speech)  signal  used.  For  example, 
a sum  of  a finite  number  of  sine- waves  without  noise  leads  to  a covariance  matrix 
of  finite  rank  (twice  this  number),  i.e.  if  the  number  of  parameters  is  more  than 
twice  the  number  of  frequencies,  the  covariance  matrix  is  singular.  Numerically  it 
is  very  possible  that  the  high  order  prediction  errors  can  become  negative  and 
consequently  the  prediction  filter  leads  to  unstable  models.  It  is  on  the  other  hand 
possible  for  recursive  algorithms  to  have  a "self-healing  property",  that  is  they  can 
converge  back  to  a stable  solution.  For  a time-varying  model  this  is  permissible 
since  the  output  power  can  still  be  bounded;  however  if  the  model  parameters  are 
sampled,  for  Instance  at  the  end  of  a transmission  frame,  the  sample  could  lie  in  an 
unstable  region  that  would  lead  to  unstable  reconstructions. 

Another  area  of  concern  during  the  implementation  process  is  the  implicit 
(statistical)  assumptions  one  makes  by  choosing  particular  Initial  states  (say  zero) 
in  recursive  algorithms.  These  choices  correspond  to  particular  assumptions  about 
the  a priori  distributions  of  parameters,  which  in  turn  might  not  be  consistent 
with  the  actual  Information  available.  For  instance  assuming  zeroes  for  data  before 
the  start  of  the  first  data  sample  is  equivalent  to  assuming  the  data  is  a white  noise 
(the  predictor  is  initialized  as  a feed-through)-,  however  we  are  working  here  with 
speech,  i.e.  we  have  at  least  an  idea  what  an  average  spectrum  looks  like  (say  from 
300  to  3kHz).  In  order  to  achieve  top  performance,  many  of  these  Issues  have  to  be 


to  some  underlying  assumptions  that  are  inconsistent. 

To  demonstrate  this  we  discuss  two  practical  problems  which  appeared  during 
the  implementation  of  these  algorithms.  The  first  one  is  concerned  with  the 
accurate  tracking  of  model  parameters  when  strong  transients,  i.e.  nongaussian 
driving  process  (e.g.  pitch  pulses  or  plosives),  are  present  in  the  initial  data 
samples.  The  second  problem  deals  with  the  capability  of  fast  resets  of  the 
recursive  algorthms  when  the  underlying  model  parameters  suddenly  decrease  to 
zero,  i.e.  accurate  tracking  of  zero- valued  parameters  (e.g.  no  speech). 
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2.7. 1 Non-Gaussian  Driving  Process 


Non-Gaussian  signals  can  have  two  effects:  either  they  can  help  to  track 
parameters  extremely  fast,  i.e.  within  a few  samples,  or  they  can  "kick  the 
parameter  estimates  way  off",  which  leads  to  very  slow  convergence  to  the  true 
value  of  the  parameter.  Our  algorithms  produce  a likelihood  type  quantity  that  can 
detect  the  presence  of  nongaussian  components  with  extreme  sensitivity.  The 
second  condition  can  be  remedied  or  prevented  as  follows.  A non-zero  (small) 
initial  prior  con  variance  estimate  of  the  input  process  should  be  used  and  a linearly 
time  varying  weighting  factor  {Tau)  up  to  the  maximum  window  length  (constant 
Tau ) is  applied  to  the  errors  in  the  least-squares  criterion.  In  the  program  this  just 
implies  that  the  variable  Tau  is  initialized  to  some  initial  weighting  (say  Tau(O)  « 
10*order)  and  linearly  increased  proportional  to  the  running  time  si  up  to  a fixed 
window  time  constant  ( sayTaumax ) and  then  held  constant  (i.e.  Tau  - ( si  - 1 Omordtr) 
min  (' Taumax ) ).  This  is  consistent  with  the  knowledge  that  speech  signals  are 
expected  and  not  just  nongaussian  inputs  or  even  out-lyers  such  as  switching 
noises.  The  problem  really  is  that  the  condition  number  (extreme  eigenvalue  ratio) 
of  the  covariance  matrix  can  get  very  large  if  the  first  sample  is  unexpectedly 
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large.  For  example,  If  the  a priori  estimate  of  the  power  is  small,  say  1/10,000,  and 
the  first  sample  is  a modest  10,  its  square  is  100  and  the  condition  number  is  equal 
to  the  ratio  102/(1/10,000)  or  one  million.  For  6 significant  figures  in  integer 
arithmetic  this  is  already  at  the  end  of  the  range,  i.e.  the  covariance  would  be 
numerically  singular.  Another  good  numerical  alternative  is  the  square-root  form 
of  the  normal  equation  or  of  our  ladder-forms.  The  solution  using  a nonzero  a 
priori  covariance  estimate  is  simpler,  but  ultimately  we  would  prefer  the 
square-root  versions  since  they  can  gain  a factor  of  two  in  effective  wordsize. 
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2.7.2  Tracking  Zero-Valued  Parameters 

Another  problem  is  the  tracking  of  zero-valued  parameters.  When  the 
underlying  parameters  suddenly  decrease  to  zero  they  can  reduce  the  data  samples 
to  very  small  values,  conversely,  if  the  signals  get  very  small,  the  parameter 
estimates  are  not  much  excited.  Special  cases  of  this  are  when  the  correlation 
between  two  signals  goes  to  zero  or  the  model  order  of  a time-varying  model 
decreases.  In  all  these  cases  the  tracking  of  the  zero-valued  parameters  becomes 
very  slow,  e.g.  exponentially  in  the  case  where  an  exponential  window  is  used  on 
the  error.  This  is  a basic  problem  in  parameter  estimation.  Briefly,  the  reason  is 
that  the  time-update  is  obtained  by  minimizing  a squared-errors  criterion.  When 
inputs  are  zeros,  the  time-update  for  the  pre-windowed  unweighted  case  just  take 
the  previous  estimate  as  the  optimal  estimate,  because  the  error  criterion  "doesn't 
care"  what  the  non-excited  parts  of  the  system  does!  The  existence  of  an 
exponential  weighting  function  in  the  error  criterion,  the  estimates  only  converge 
to  zero  at  the  rate  of  the  associated  time  constant,  which  is  undesirable  if  fast 
tracking  of  zero- valued  parameters  is  important  for  reconstruction  of  speech.  The 
solution  may  be  to  make  use  of  the  likelihood  ratio  parameters  to  detect  the 
condition  for  reseting  the  appropriate  parameters.  This  is  a fairly  new  problem  in 
system  identification  and  not  much  theory  is  yet  available. 
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3.  The  Speech  Transmission  System 
The  algorithms  described  in  Section  2 are  the  basic  building  blocks  of  a complete 
speech  compression  system.  The  structure  of  the  system  which  was  developed  and 
tested  during  our  Investigation  is  described  in  Section  3.1.  The  rest  of  the  section 
presents  in  greater  detail  the  various  components  of  this  system. 

I 3.1  General  System  Description 

1 

. The  basic  structure  of  the  speech  transmission  system  used  to  demonstrate  the 

I 

1 

performance  of  the  new  speech  modeling  algorithms  is  given  in  figure  3.1.  The 
sampled  speech  is  the  input  data  to  one  of  the  ladder-form  algorithms  described  in 
Section  2.  The  ladder  form  is  a recursive  type  algorithm  and  produces  a new  set  of 
outputs  at  each  sample  time.  The  model  parameters  (reflection  coefficients)  are 
sampled  at  each  transmission  frame  time  and  they  constitute  the  main  part  of  the 
per  frame  speech  parameter  vector.  The  other  components  of  that  vector  are  pitch 
information  and  residual  energy  (or  gain).  The  pitch  information  is  obtained  from 
one  of  the  coefficients  computed  by  the  ladder  form  algorithm,  as  will  be  described 
in  more  detail  in  3.2.  The  energy  parameter  is  a measure  of  the  total  energy  in  the 
speech  frame  being  transmitted  and  serves  to  control  the  synthesis  algorithm 
driving  signal. 

The  speech  parameter  vector  is  used  to  reconstruct  or  synthesize  the  speech  from 
which  4t  was  generated.  First,  however,  it  is  necessary  to  transmit  this 
information  from  the  analyzer  to  the  synthesizer.  Furthermore,  the  objective  of 
our  work  is  to  be  able  to  do  this  transmission  at  a low  data  rate.  Thus,  the 
parameter  vector  has  to  be  quantized  and  encoded  in  an  efficient  manner  (e.g.  with 
few  bits,  but  minimum  distortion).  This  function  is  performed  by  the 
encoder/decoder  which  is  discussed  in  Section  3.3.  The  coder/decoder  is  capable  of 
reconstructing  a good  approximation  to  the  speech  parameter  vector  at  the  speech 
synthesizer  input. 

The  speech  synthesizer  uses  the  model  parameters  provided  by  the  encoder  to  set 
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up  a speech  model  which  in  some  sense  reflects  the  shape  of  the  vocal  tract.  The 
pitch  and  enery  information  from  the  parameter  vector  is  used  to  generate  a 
driving  process  which  is  then  used  to  drive  the  speech  model.  The  details  of  the 
synthesis  algorithm  are  presented  in  Section  3.4. 

A basic  quantity  of  the  speech  transmission  system  design  is  the  frame  size.  The 
frame  is  a segment  of  the  speech  which  is  chosen  as  the  basic  unit  of  the 
transmission  system.  During  each  frame  time  a single  block  of  quantized  speech 
parameters  is  transmitted.  To  see  the  relationships  between  the  various  quantities 
let 

fs  = Sampling  rate  [ Hz  ] 

Ns  = Number  of  raw  speech  samples  in  a single  frame  [ samples/frame  ] 

B = Transmission  rate  [ bits/second  ] 

Nb  = Number  of  bits  per  coded  speech  parameter  vector  [ bits/frame  ] 

In  order  to  get  real  time  transmission  we  must  have  ( fs/Ns ) s ( BINb ) or 

( Block  T ransmission  Time)  s ( Frame  Time)  ( 1 ) 

In  words,  the  transmission  of  the  parameters  for  a speech  frame  must  be  completed 
in  less  than  the  frame  time.  In  most  cases  a strict  equality  will  be  chosen  in 
equation  (1),  thus 

Nb/B  - Nslfs  bps  (2) 

i.e.  B - NbfsINs  (3) 

Some  typical  values  used  in  our  simulations  are 

fs  - 8000  , Ns  - 60  , Nb  - 18  , (4) 

which  means  that  a transmission  rate  of  B * 2.4  kbps  is  required  and  that  the  frame 
duration  is  7.5  msec,  i.e.  within  a commonly  accepted  limit  on  the  rate  of  change  of 
speech  parameters. 
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Fig.  3.1  The  speech  transmission  system 


SPEECH  TRANSMISSION  SYSTEM 


3.2  Speech  Analysis  and  Parameter  Transmission 


The  computation  of  the  reflection  coefficients  can  be  carried  out  by  any  one  of 
the  algorithms  described  earlier.  For  the  actual  simulations  the  pre-windowed 
ladder  form  was  chosen  as  being  the  most  cost-effective  candidate  for  this  purpose 
(see  Section  5).  Outputs  of  the  algorithm  feed  a transmission  module  which  selects 
the  contents  of  the  per-frame  parameter  vector. 

Because  of  the  recursive  nature  of  our  algorithms,  they  can  be  run  continuously 
and  do  not  have  to  be  synchronized  to  the  frame  period.  Once  in  every  frame  period 
the  best  estimates  of  the  reflection  coefficients  are  chosen  for  transmission. 
Similarly  pitch  and  energy  information  is  passed  on  to  the  transmission  encoder. 

The  analysis  algorithm  decomposes  the  original  speech  signal  into  a model,  to  be 
represented  by  the  reflection  coefficients,  and  a driving  process,  to  be  represented 
by  pitch  and  energy.  Since  the  analyzer  operates  "continuously",  its  outputs  track 
the  original  speech.  This  permits  us  to  logically  separate  the  speech  modelling 
from  the  data  transmission.  Provided  we  transmit  the  model  parameters  "often 
enough",  we  are  at  liberty  to  choose  the  frame  size  to  optimize  transmission 
efficiency.  There  is  a tradeoff  to  be  made.  We  can  choose  a short  frame  and  send 
parameter  updates  more  often,  but  at  the  expense  of  having  fewer  bits  available  to 
do  it  with.  Alternatively,  we  can  choose  a long  frame,  providing  many  bits  to 
quantize  the  parameters,  but  at  the  expense  of  sending  fewer  parameter  updates. 
The  tracking  property  of  the  analysis  algorithm  allows  this  tradeoff  to  be  made 
independently  of  the  algorithm  design. 

3.2.1  Model  Parameters 

At  each  frame  time,  we  choose  a best  estimate  of  the  model  parameters  for  the 
frame.  Sampling  the  model  parameters  current  at  the  frame  boundary  is  appealing, 
but  we  have  adopted  a slightly  modified  strategy.  Simply  stated,  at  each  frame 
boundary,  we  transmit  the  most  recent  "reasonable"  model  parameters.  Since  the 
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algorithm  tracks  so  fast,  if  a pitch  pulse  or  algorithm  reset  occurs  very  near  the  end 
of  a frame,  a pure  sampling  strategy  would  select  parameters  not  representative  of 
the  entire  frame.  The  transmission  module  maintains  a short  history  of  the  model 
parameters;  if  a pitch  pulse  or  reset  occurs  in  approximately  the  last  third  of  the 
frame,  then  we  transmit  parameters  current  slightly  before  the  disturbance. 

This  rules  were  motivated  by  our  finding  that  in  the  voiced  case  the  reflection 
coefficients  are  piece-wise  constant.  From  a rate  distortion  point-of-view  we 
would  have  to  test  the  time- varying  model  against  a set  of  prototype  models  (that 
could  be  constant  within  a frame)  and  send  the  index  of  the  "nearest"  model  in  a 
distortion  measure. 

First  we  used  a uniform  quantization  scheme  to  test  out  our  programs  and  the 
limits  of  a simple  quantization  scheme,  acceptable  for  9600baud  transmission  rates. 
In  order  to  compare  our  models  against  other  modeling  schemes  we  selected  for  one 
set  of  experiments  up  to  10  reflection  coefficients  using  the  optimal  quantizer 
published  in  [MG],  and  bit  rates  of  3050  and  6100  just  for  the  reflection 
coefficients  were  observed.  As  a next  test  our  own  pitch  detection  algorithm  was 
used  to  test  out  a complete  transmission  system.  This  is  unfortunately  a difficult 
step  since  a separate  evaluation  of  the  modeling  part  of  the  system  can  only  be  done 
with  a perfect  pitch  detector  which  was  unavailable  to  us  at  the  time  of 
evaluation.  (This  is  somewhat  similar  to  trying  to  test  race-tires  on  a passenger 
car!) 


3.2.2  Pitch.  Information 

Pitch  detection  is  a crucial  and  difficult  part  of  most  speech  compression 
techniques.  Many  kinds  of  pitch  detection  techniques  have  been  used  (see  e.g.  the 
survey  by  Rabiner  [Rab]). 

The  ladder-form  algorithms  developed  in  this  investigation  provide  a novel 
integral  pitch  detection  method  which  seems  to  be  very  promising.  The  speech 


driving  process  consists  of  a gaussian  part  (unvoiced),  and  a jump  part  (voiced). 
The  separation  of  such  a mixed  process  may  be  described  as  a Doob-Meyer 
decomposition  into  predictable  and  Martingale  difference  components.  Such 
decompositions  have  recently  been  receiving  increased  attention  in  the  non-linear 
estimation  literature  but  almost  no  practical  results  are  available. 

A parallel  algorithm  might  be  designed  to  obtain  the  pulse  positions  from  the 
original  time  series,  but  the  non-whiteness  of  the  gaussian  component  will  act  to 
cloud  the  position  of  the  impulses.  In  our  ladder-form  algorithm,  the  time  update 
y (gamma  parameter)  is  a liklihood  ratio  directly  useful  in  separating  the  mixed 
driving  process.  The  normalized  log-likelihood  function  for  a process  parametrized 
by  its  innovations  representation  using  ladderforms  can  be  obtained  relatively 
easily  using  the  formulas  appearing  in  the  ladder  form  equations  given  in  previous 
section.  Given  a (zero  mean)  scalar  gaussian  process  with  covariance  matrix  R,  the 
determinant  |/?|  - ...  <Rip  , where  the  { Rti  } are  the  prediction  error 

covariances  related  to  the  reflection  coefficients  Ki  in  the  stationary  case  via 
- R(i  ( 1 - K?  ) . Now  the  log-likelihood  function  U can  be  obtained  using  (290  of 
Section  2 and  the  standard  formula 

U - ln\R\  + |(y||Vl  ■ fi  ln  + yP  ■ <«> 

i-0 

We  applied  several  versions  of  this  formula  to  the  pitch  detection  problem.  The 
increase  in  U per  sample,  i.e.  the  time  differenced  form  8 /Ms  a very  sensitive 
measure  of  the  "unexpectedness"  of  a time  sample,  i.e.  a measure  of  deviation  from 
non-Gaussianness.  We  have  Used  a simple  local  maximum  algorithm  either  on  6 
y or  6 U combined  with  an  exponential  threshold  detector  to  locate  the  position 
of  driving  impulses.  As  the  figures  ln  Section  5 show,  the  result  is  quite 
remarkable.  The  large  jumps  in  U or  7 itself  can  clearly  be  related  to  the  start  of  a 
new  word  or  even  phoneme.  Eventhough  this  research  was  not  directed  towards 
pitch  detection,  we  discovered  that  a high  quality  pitch  detection  scheme  was 
required  in  order  to  test  our  modeling  methods.  The  likelihood  approach  is 
extremely  well  matched  to  our  modeling  approach,  so  some  effort  was  directed 
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towards  the  development  of  this  method.  However  we  consider  this  as  only  a first 
attempt  in  the  direction  of  finding  methods  to  decompose  (discrete)  mixed 
processes.  Such  results  would  hopefully  avoid  the  use  of  any  more  or  less  ad-hoc 
schemes  containing  internal  "tweeking"  factors  of  heuristic  nature,  a characteristic 
of  most  present  day  pitch  detectors. 

The  end  result  of  all  this  is  that  the  speech  analysis  algorithm  includes  an 
integral  pitch  detector  which  provides  a pitch-pulse-present  signal  at  every  sample 
time.  In  our  transmission  module,  we  have  adopted  the  strategy  of  transmitting 
the  time  indices  of  these  pulses.  In  a loose  sense,  we  are  using  a run-length  coding 
scheme  to  transmit  the  sparse  one-bit-per-sample  sequence  which  represents  the 
pulse  locations.  An  important  result  of  this  technique  is  that  there  is  no  need  to 
make  the  usual  voiced/unvoiced  decisions. 

3.2.3  Energy 

The  ladder-form  residuals,  or  innovations,  contain  the  energy  of  the  driving 
process.  Our  transmission  module  computes  the  frame  average  driving  source 
power  by  summing  the  energy  of  the  innovations  of  the  original  signal  over  each 
frame. 


Figure  3.2  Illustration  of  Log-Likelihood  Computations  for  Pitch  Detection 


3.3  Quantization  Issues 


For  comparison  reasons  we  used  the  uniform  single  symbol  quantization  scheme 
given  in  [MG].  At  80  samples  per  frame  and  10th  order  models  the  61  bits  per  frame 
lead  to  a transmission  rate  of  6100bits  per  second  for  the  K’s  only.  The  speech 
degradation  using  this  quantization  was  not  perceptible  in  our  experiment.  A few 
trials  were  made  with  rate-distortion  quantizers  at  8 bits  per  frame  for  the  K's,  i.e. 
about  a reduction  of  a factor  of  8 in  transmission  rate  for  the  K's  alone.  The 
prototype  models  needed  in  the  rate-distortion  have  to  be  trained  to  the  particular 
method  and  a collection  of  representative  speakers.  Nevertheless,  in  one  of  our 
experiments  we  encoded  our  model  parameters  obtained  without  pre-emphasis 
with  respect  to  a set  of  prototypes  trained  on  a different  speaker  set  using  a 
standard  LPC  system  with  pre-emphasis,  see  [MG].  Using  also  a subgrade  pitch 
detector,  the  result  was  surprisingly  good  considering  the  fact  that  only  8 bits  (i.e. 
256  prototypes)  were  used.  We  do  not  consider  this  a conclusive  experiment,  but 
rather  a strong  indication  that  these  methods  are  very  promising  and  have  the 
potential  to  achieve  the  desired  low  transmission  rates.  Pitch  information  and 
energy  parametrization  can  be  handled  similarly,  if  our  modeling  techniques  can  be 
extended  to  produce  somewhat  more  accurate  pitch  period  estimates  than  this  first 
attempt.  The  raw  pitch  period  estimates  obtained  now  would  have  to  be  smoothed 
over  several  pitch  periods  and  rate-distortion  encoded.  Energy  or  gain  can  be 
handled  relatively  simply  via  rate-distortion  encoding,  i.e.  similarly  to  the  K's. 
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3.4  Speech  Synthesis 


/ 


The  speech  synthesis  uses  the  frame  synchronous  parameter  vector  to  set  the 
model  parameters  of  a ladder  form  filter  and  to  set  up  the  appropriate  driving 
process.  The  model  creation  is  straightforward.  There  is  no  need  to  modify  the 
reflection  coefficients  to  create  the  appropriate  inverse  filter  as  a ladder  form  may 
be  inverted  by  reversing  its  signal  flow  graph  and  running  it  backwards. 

The  synthesis  module  creates  a driving  process  based  on  the  per-frame  pulse 
position  and  energy  information  in  the  parameter  vector.  If  no  pulses  are  present  in 
a frame,  the  source  energy  is  put  into  a gaussian  noise  component  - i.e.  an  unvoiced 
decision  is  made.  If  pulses  are  found  the  source  energy  is  distributed  equally 
among  them.  Ultimately  the  driving  process  could  be  a mixture  of  a gaussian  and 
jump  component,  each  with  its  own  gain  (and  even  its  own  linear  filter.) 

Many  future  practical  problems  remain,  such  as  inter-  and  intra-frame 
smoothing  and  dividing  up  the  energy  between  gaussian  and  jump  components. 
Again,  ideally  a time- varying  rate  distortion  encoder  could  take  care  of  this. 
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4.  Software  Tools 


The  similarity  between  the  various  speech  processing  algorithms  suggested  that 
they  be  written  as  simple  modules  which  "plug"  into  a framework  that  supplies  I/O 
facilities,  graphics,  and  debugging  tools.  The  resulting  algorithm  module  expresses 
the  essence  of  the  algorithm,  and  removes  the  extraneous  code  required  to  interface 
the  algorithm  to  some  particular  machine  environment.  We  have  also  attempted  to 
WTite  the  framework  as  well  as  the  modules  in  a machine-independent  fashion. 
This  necessitates  a machine-independent  language,  and  we  have  chosen  MAINSAIL 
[Wil],  a dialect  of  Algol  patterned  after  SAIL  [Reis]. 


4. 1 A Machine-Independent  Language 

Unlike  Fortran,  which  is  not  actually  portable,  MAINSAIL  (MS  for  short)  was 
expressly  designed  to  allow  program  portability  to  any  reasonable  machine  with  a 
word  length  of  16  bits  or  larger.  The  compiler  itself  is  written  in  MS,  and 
currently  generates  code  for  PDP-lOs  and  PDP-lls.  Support  for  Data  General 
machines,  IBM  370s,  Interdatas,  and  other  machines  is  planned.  We  expect  that  the 
module  sources  will  be  very  portable  as  a result.  The  frame  and  existing  modules 
are  written  in  MS,  but  are  now  compiled  with  SAIL  because  the  MS  compiler  is  not 
well  supported  at  SUAI  yet.  The  PDP- 1 1 UNIX  version  of  MS  is  expected  soon.  A 
library  of  SAIL  macros  and  procedures  is  currently  used  to  simulate  MS  syntax  and 
its  runtime  environment.  Preliminary  experience  indicates  that  about  5 minutes  of 
editing  is  required  to  convert  one  of  these  modules  to  the  real  MS  syntax.  Programs 
which  use  more  extensive  syntax  require  more  effort. 

The  MS  language  supports  integers  and  long  integers  with  guaranteed  minimum 
sizes  of  16  and  32  bits,  respectively.  The  long  integer  construct  is  essential  here 
for  sample  numbers,  etc.  MS  supports  reals  and  long  reals  with  minimum  sizes  of  6 
and  1 1 decimal  digits,  respectively,  with  base  1 0 exponent  ranges  of  plus  or  minus 
36.  There  are  garbage-collected  strings  ala  SAIL  and  PL/1,  and  garbage-collected 


records  and  pointer  variables.  An  Important  point  is  that  I/O  is  carefully  defined  so 
that  both  text  and  data  files  can  be  randomly  accessed  in  a way  which  is  relatively 
independent  of  the  machine  and  operating  system.  The  data  files  themselves  are 
not  directly  portable,  but  programs  which  read  and  write  data  files  are 
functionally  equivalent  on  different  machines. 
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4.2  Frame  and  Module  Structure 

Each  algorithm  module  is  separately  compiled  and  dynamically  linked  with  the 
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framework  at  run-time  by  the  MS  run-time  system.  This  costs  very  little,  since 
each  MS  module  is  position-independent,  and  accesses  data  and  procedure  fields  in 
other  modules  via  a polnter/field  addressing  scheme.  It  also  means  that  very  large 
programs  can  be  supported  in  a small  memory  space  by  breaking  them  into  modules 
which  are  demand  swapped  into  memory  by  the  run-time  system.  This  requires 
that  the  machine  architecture  support  position-independent  code  so  that  modules 
may  be  loaded  anywhere  without  relocation.  MS  code  modules  are 
garbage-collected  Just  like  records  when  space  is  needed. 

Each  algorithm  module  consists  of  5 procedures  and  some  internal  data  fields. 
The  procedures  are  named  defineExp,  initialize,  analyze,  quantize,  and  synthesize. 
The  frame  calls  these  when  appropriate;  its  main  processing  loop  has  the  following 
skeleton  structure: 

initialize;  • set  up  initial  state 

for  block  *■  0 upto  N blocks- 1 do 


begin 

read  input  data  block; 
analyze ; 
quantize; 
synthesize, 

write  output  data  block; 

end; 


• input  speech  signal 

• linear  prediction 

• parameter  transmission 

• reconstruct  speech 

• output  speech  8c  diagnostics 


1 


I 


j 


i 


I 


A block,  would  be  just  a single  sample  for  an  on-line  method.  This  structure 
represents  a sub-experiment;  it  is  enclosed  in  a loop  which  allows  the  initialize 
procedure  to  change  parameters  which  must  remain  fixed  while  reading  the  input 
speech  file,  such  as  block  size.  The  same  speech  file  is  re-read  for  each 
sub-experiment,  and  the  output  speech  file  is  a concatenation  of  the  results  of  each 
sub-experiment.  The  defineExp  procedure  is  called  at  the  very  beginning  of  the 
total  experiment  to  set  initial  parameter  values. 

The  SU-AI  SAIL  system  does  not  support  the  MS  module  concept,  so  a similar 
structure  was  worked  out  that  is  compatible  with  SAIL  and  upgradable  to 
MAINSAIL.  Essentially  each  module  is  a separate  program,  e.g.  analyze,  quantize,  and 
synthesize.  The  communication  between  the  "modules"  is  via  data  files.  A number 
of  other  programs  have  been  written  to  manipulate  these  data  files,  including  an 
interactive  graphing  program.  The  internal  structure  of  each  program  is  still  clean, 
however,  since  the  I/O  required  is  simple  (it  emulates  the  MS  1/0  system).  Also,  the 
modules  are  broken  into  several  procedures  for  clarity,  and  structured  coding 
practices  were  used  to  maximize  the  readability. 

The  flow  of  control  in  each  module  can  be  better  ascertained  from  the  MAINSAIL 
(MS)  source  than  from  a flowchart-type  description,  which  does  not  convey  the 
hierarchical  structure.  The  flowchart  is  an  attempt  to  document  the  flow  of 
control  in  a program  which  has  poor  structure.  With  languages  like  Fortran, 
where  undisciplined  use  of  GOTOs  and  three-way  IF  statements  obscure  the 
control-flow,  the  flowchart  may  provide  some  information.  But  with 
block-structured  languages  like  MS  and  SAIL,  the  control  flow  is  most  naturally 
described  using  the  language  features  provided  for  structured  control,  namely 
WHILE,  FOR,  CASE,  and  IF-THEN-ELSE.  In  addition,  the  variable  names  have  more 
meaning  because  both  upper  and  lower  case  are  allowed,  and  may  be  any  length. 
Flowcharts  were  not  made  for  the  programs  included  in  the  appendix  because  such 
a flowchart  is  more  difficult  to  understand  than  the  actual  sources. 
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5.  Simulation  Results 


In  this  section  we  present  results  of  the  simulations  carried  out  to  test  and 
verify  our  algorithms. 


5.1  Constant  parameter  AR  models 
We  first  present  simulation  results  of  the  analysis  of  data  generated  by 
time-invariant  models  using  the  pre-windowed  ladder  form  and  the  Levinson 
algorithm.  Both  algorithms  were  applied  to  autoregressive  (AR)  systems  of  various 
orders.  Reflection  coefficients  were  computed  on-line  in  the  pre-windowed  ladder 
form.  In  the  Levinson  algorithm,  reflection  coefficients  were  computed  with  data 
blocks  of  increasing  blocksi2e,  a excellent  method  for  testing  block  methods  for 
consistency. 

Data  generated  from  two  AR  models,  4th  and  8 th  order,  were  tested.  Zero-mean 
unit  variance  white  noise  input  was  used.  The  parameters  for  the  8th  order  model 
were  taken  from  [MG,p.l28], 

The  model  parameters  ( A coefficients  ) and  their  corresponding  reflection 
coefficients  for  the  two  models  are  shown  in  Table  4.1  and  output  data  are  shown 
in  Figures  4.1  a,b. 

5.1.1  Simulation  Results 

A large  number  of  simulations  were  performed  on  the  most  promising 
algorithms  and  a few  characteristic  examples  are  presented  here.  For  each 
algorithm,  1000  samples  of  data  were  analyzed  and  the  estimates  of  the  reflection 
coefficients  were  plotted  as  a function  of  number  of  samples.  All  figures  have  time 
or  number  of  samples  as  abscissa,  where  "1"  corresponds  to  1000  samples.  The 
ordinate  is  the  scale  for  the  values  of  the  various  estimates  of  the  reflection 
coefficients,  using  an  automatic  scaling  of  maximal  and  minimal  value 


corresponding  to  the  top  and  bottom  of  the  graph.  A horizontal  line  indicates  the 
true  value  of  the  estimated  reflection  coefficients;  hence  the  asymptotic  behavior 
or  convergence  of  the  estimates  as  a function  of  time  can  easily  be  inspected. 

In  order  to  eliminate  the  irregularities  in  the  estimates  caused  by  block, 
boundaries,  some  windowing  was  applied  to  the  data  within  each  block. 

A trapezoidal  window  of  the  following  form 

w{n)  ■ 7i  / 10  , for  1 s 7i  S 9 

- ( T - 71  + 1 ) / 10,  for  ( T - 9 ) s tz  s T 

m 1 , otherwise 

and  a Hamming  window  of  the  following  form 

iu(ti)  - 0.54  - 0.46  cos  ( 2 7T  ^ — j- ) 

were  used. 

Estimated  Parameters  of  4th  Order  Model 
In  order  to  illustrate  and  compare  the  convergence  properties  of  each  algorithm, 
the  estimated  reflection  coefficients  from  each  algorithm  were  collected  and 
displayed  at  each  prediction  order.  The  results  are  shown  in  Figures  4.2  - 4.5.  In 
each  figure,  the  displays  are  arranged  in  the  following  order 

(a)  pre-windowed  ladder  form  K*  T , 

(b)  pre-windowed  ladder  form  K rp  f , 

(c)  Levinson  Kpj  without  windowing, 

(d)  Levinson  Kp  j with  trapezoidal  window,  and 

(e)  Levinson  Kp  j with  Hamming  window. 

The  algorithms  were  also  extended  to  estimate  beyond  4th  order  upto  8th  order, 
and  the  results  are  shown  in  Figures  4.6  - 4.9. 

Estimated  Parameters  of  8th  Order  Model 
The  results  for  the  8th  order  model  were  arranged  in  the  same  fashion  as  for  the 
4th  order  model,  and  they  are  shown  in  Figures  4.10-4.17 


5.1.2  Discussion  of  Results 


A comparison  of  the  various  simulations  shows  the  following  characteristics. 

1.  Comparing  the  estimates  of  the  reflection  coefficients  from  the 
pre-windowed  ladder  form  and  those  from  the  Levinson  algorithm,  it  can  be 
readily  seen  that  that  the  latter  estimates  have  a higher  variance,  i.e.  they  are  very 
"noisy".  Closer  inspection  reveals  that  the  pre-windowed  estimates  match  one  of 
the  "envelopes"  of  the  noisy  estimates  computed  via  the  Levinson  recursions; 
furthermore  it  is  the  envelope  closer  to  the  true  estimates.  Hence  the  Levinson 
estimates  appear  to  be  uniformly  worse  than  the  pre-windowed  estimates  in  terms 
of  mean  bias. 

2.  The  convergence  of  both  pre-windowed  and  Levinson  recursions  seem  to 
depend  somewhat  on  the  index  of  the  reflection  coefficients.  The  pre-windowed 
estimates  converge  in  general  to  within  10%  on  less  than  250  samples,  (i.e.  a 
typical  blocksize  for  speech  analysis).  However  the  Levinson  recursions  produce 
estimates  that  are  so  "noisy"  that  only  non-linear  schemes  could  improve  them 
because  of  the  asymmetry  of  the  irregularities.  (The  pictures  suggest  the  idea  of 
"weeds"  growing  away  from  the  best  estimates  represented  by  the  envelope  closer 
to  the  true  parameter  value). 

3.  In  order  to  reduce  the  high  variance  of  the  estimates  obtained  from  the 
Levinson  recursions,  windowing  has  traditionally  been  used.  Our  simulations 
using  a Hamming  window  show  unfortunately  that  windowing  can  quite 
drastically  alter  the  dynamics  of  the  estimates.  The  initial  transients  are  completely 
different  even  with  a trapezoidal  window  which  affects  only  peripheral  points  ( n 
m 10  ).  Even  more  drastic  differences  are  produced  by  the  use  of  the  Hamming 
window.  It  appears  that  the  transients  are  now  a complicated  convolution  of  the 
original  response  and  the  window  function.  Smoothing  is  achieved  for  some 
parameters  but  at  the  cost  of  new  transients  lasting  over  500  samples,  or  more  than 
twice  the  commonly  used  blocksize.  This  would  lead  to  biases  in  the  estimates  of 
the  reflection  coefficients  and  would  ultimately  result  in  "rough"  speech. 

4.  In  light  of  these  results,  several  alternatives  will  be  explored.  The 
pre-windowed  method  can  be  easily  modified  to  use  an  exponential  weighting  or 


I 


scaling  so  that  the  most  recent  data  contributes  most  to  the  estimates.  The 
convergence  variance  and  bias  of  the  estimates  can  be  studied  via  the  forward  and 
backward  prediction  errors.  The  Levinson  recursion  without  windowing  can  serve 
as  a standard  to  compare  other  forms  like  pre-windowed,  "covariance"  and  Burg 
type  estimates.  The  somewhat  ad  hoc  windowing  commonly  used  can  lead  to 

' 

drastic  changes  in  the  transient  behavior  of  these  algorithms,  and  should  therefore 
be  used  with  great  caution!  Methods  which  don’t  require  windowing,  such  as  the 
ladder  - forms,  avoid  this  problem  altogether. 
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5.4 


4th  order  AR  process 


Sth  order  AR  process 


i 

A[i] 

K[i] 

0 

1.000000 

.0000000 

1 

-1.31 3600 

-.7424092 

2 

1.440100 

.8082622 

3 

-1.091900 

.01756615 

4 

.8352700 

.8352700 

i 

A[j] 

K[i] 

0 

1.000000 

.0000000 

1 

-2.346440 

-.9421574 

2 

1.656970 

.9238739 

3 

-.005990000 

-.5619754 

4 

.3230500 

-.09454902 

5 

-1.482130 

.2021682 

6 

1.154630 

.5359436 

7 

-.1896600 

-.3292221 

8 

-.05899000 

-.0589900( 

Table  4.1  Model  parameter  and  reflection  coefficients  of  AR  models. 


(c)  Reflection  Coefficient,  Levinson  Algorithm  without  windowing 


(d)  Reflection  Coefficient;  Levinson  Algorithm  with  Trapezoidal  window 


(e)  Reflection  Coefficient,  Levinson  Algorithm  with  Hamming  window 


Figure  4.2  First  order  reflection  coefficient  of  4th  order  AR  process. 
All  algorithms  give  reflection  coefficients  that  converge  within 
100  samples.  Notice  the  smoothing  effect  of  the  two  windowed  estimates. 


(d)  Reflection  Coefficient, Levinson  Algorithm  with  Trapezoidal  Window 
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(e)  Reflection  Coefficient,  Levinson  Algorithm  with  Hamming  window 

Figure  4.3  Second  order  reflection  coefficient  of  4th  order  process. 

Both  reflection  coefficients  from  Pre-windowed  Ladder  Form  converge  to 
the  true  value  in  250  samples.  Also,  estimate  from  Levinson  Algorithm 
without  windowing  converges  in  the  same  fashion  as  the  Pre-windowed 
Ladder  Form  . Both  of  the  windowed  estimates  exhibit  smoothing  effects 
and  in  particular,  the  Hamming  window  introduces  biased  behaviour 
at  around  250  samples  region. 
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(c)  Reflection  Coefficient , Levinson  Algorithm  without  windowing 
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(d)  Reflection  Coefficient,  Levinson  Algorithm  with  Trapezoidal  window 


(e)  Reflection  Coefficient,  Levinson  Algorithm  with  Hamming  window 

Figure  4.4  Third  order  reflection  coefficient  of  4th  order  AR  process. 
Notice  the  noise-like  spikes  appearing  in  the  estimates  obtained 
from  Levinson  Algorithm  without  windowing.  The  picture  suggests 
the  idea  of  "weeds"  growing  away  from  the  best  estimates  represented 
by  the  envelope  closer  to  the  true  parameter  value. 
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(d)  Reflection  Coefficient,  Levinson  Algorithm  with  Trapezoidal  window 


(e)  Reflection  Coefficient,  Levinson  Algorithm  with  Hamming  window 


Figure  4.5  Fourth  order  reflection  coefficient  of  4th  order  AR  process. 

Notice  how  the  Hamming  window  alters  the  transient  behaviour  during 
the  first  250  samples. 


(c)  Reflection  Coefficient,  Levinson  Coefficient  without  windowing 
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(e)  Reflection  Coefficient,  Levinson  Algorithm  with  Hamming  window 

Figure  4.6  Fifth  reflection  coefficient  of  4th  order  AR  process. 

Bias  was  observed  on  all  algorithms.  Notice  the  drastic  "weeds" 
appearing  in  the  estimates  from  Levinson  Algorithm  without  windowing 
and  the  smoothing  effects  of  the  two  windows. 
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(d)  Reflection  Coefficient,  Levinson  Algorithm  with  Trapezoidal  window 


(e)  Reflection  Coefficient,  Levinson  Algorithm  with  Hamming  window 
Figure  4.7  Sixth  order  reflection  coefficient  of  4th  order  AR  process. 

All  algorithmsgive  estina  tes  that  converge  to  zero,  the  true 

value  in  100  samples. 
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(c)  Reflection  Coefficient,  Levinson  Algorithm  without  windowing 
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(d)  Reflection  Coefficient,  Levinson  Algorithm  with  Trapezoidal  window 


(e)  Reflection  Coefficient,  Levinson  Algorithm  with  Hamming  window 
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Figure  4.9  Eighth  order  reflection  coefficient  of  4th  order  AR  process. 
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(e)  Reflation  Coefficient,  Levinson  Algorithm  with  Hamming  window 

Figure  4.11  Second  order  reflecton  coefficient  of  8th  order  AR  process. 
Both  reflection  coefficients  of  the  Pre-windowed  Ladder  Form  converge 
to  the  true  value  with  100  samples.  In  the  Levinson  algorithm  without 
windowing,  the  envelope  also  converges  in  the  same  rate.  In  both 
of  the  windowed  cases,  the  initial  transients  were  altered.  In  the 
Hamming  window  case,  bias  in  the  estimates  are  apparent. 
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(d)  Reflection  Coefficient,  Levinson  Algorithm  with  Trapezoidal  window 


(e)  Reflection  Coefficient,  Levinson  Algorithm  with  Hamming  window 

Figure  4.  12  Third  order  reflection  coefficient  of  8th  order  AR  process. 
Notice  the  drastic  eff©cts  of  "weeds"  in  the  estimates  of  Levinson 
Algorithm  with  windowing. 


(d)  Reflection  Coefficients,  Levinson  Algorithm  with  Trapezoidal  window 
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(e)  Reflection  Coefficient,  Levinson  Algorithm  with  Hamming  window 


Figure  4.13  Fourth  order  reflection  coefficient  of  8th  order  AR  process 
Bias  is  observed  for  all  estimztej. 


(c)  Reflection  Coefficient,  Levinson  Algorithm  without  windowing 


5.2  Time-Varying  AR  models 


5.2.1  Results  on  Simulated  Vocal  Tract  Behavior 
This  set  of  tests  is  to  demonstrate  the  tracking  ability  of  the  PW  ladder  form. 
Two  sets  of  data  were  generated  from  an  8th  order  AR  model  with  time-varying 
(piecewise  constant)  reflection  coefficients  that  converge  exponentially  to  zero. 
The  first  set  of  data  was  obtained  by  driving  the  input  with  Gaussian  white  noise 
only.  The  second  was  obtained  by  driving  the  input  with  Gaussian  white  noise  plus 
a pulse  train.  The  pulses  occur  at  the  step  changes  of  the  reflection  coefficients,  and 
thus  closely  simulate  actual  vocal  tract  and  speech  behaviour.  Figure  1(a)  and  1(b) 
show  the  actual  data,  figures  2-9  show  the  convergence  of  all  eight  of  the 
reflection  coefficients.  Figure  10  shows  the  behavior  of  the  likelihood  quantity 
gamma7,  suggesting  its  possible  role  in  a pitch  detection  scheme. 

The  data  sets  are  each  2048  samples  long.  The  step  changes  occur  at  every  128 
samples.  The  time-constant  of  the  tracking  rate  was  set  at  100,  i.e.  at  about  the  rate 
of  the  changes. 

Observations 

Observe  that  for  the  unvoiced  data,  the  estimated  reflection  coefficients  all  tend 
to  fluctuate  when  they  are  close  to  zero.  While  for  voiced  data,  the  estimated 
reflection  coefficients  converge  uniformly  to  the  true  piecewise  constant  behavior. 
A carefull  study  of  the  log-likelihood  function  as  described  in  the  context  of  pitch 
detection  and  the  conditioning  of  the  underlying  covariance  matrix  explains  these 
phenomena.  The  log-likelihood  function  is  proportional  to  /n(l  - K?),  i.e.  if  the 
magnitude  of  the  K^s  are  close  to  one,  the  log-likelihood  is  a very  sensitive 
function  or  conversely  if  the  K's  are  small  they  are  not  very  important,  therefore 
they  are  harder  to  estimate  with  a least-squares  criterion.  An  other  observation  is 
that  if  the  very  first  sample  is  a pulse,  this  Implies  that  the  problem  is 
ill-conditioned.  Simulations  show  that  if  the  first  few  samples  are  gaussian  type 
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Fig.  4(e).  Estimated  «3  of  non-gaussian  input,  using  time-varying 
weighting  factor. 
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Fig. 5(a)  Fourth  reflection  coetticient,  , ot  underlying  model. 
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Fig. 5(b)  Estimated  of  unvoiced  data  (generated  oy  unite  noise  only) 


Fig.  6(e).  Estimated  Kg  of  non-gaussian  input  using  time-varying 
weighi tng  factor. 
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5.2.2  Results  on  Real  Speech.  Data 

5.2.2(a)  Local  Behaviour  of  Reflection  Coefficients 
This  test  is  to  demonstrate  the  dynamic  behaviour  of  the  estimated  reflection 
coefficient  during  a segment  of  voiced  speech  data.  Two  sets  of  speech  data  were 
used.  The  first  set  is  taken  from  a series  of  high  resolution,  sampling  rate  of  20,000 
samples  per  second,  speech  data.  The  data  was  the  vowel  /e/  in  the  word  "the". 
This  test  will  illustrate  the  local  behaviour  of  reflection  coefficients  within  a few 
pitch  periods.The  total  number  of  samples  in  the  segment  is  960  and  the  time 
constant  of  the  tracking  rate  was  set  at  300.  The  estimated  reflection  coefficient, 
the  normalized  prediction  errors  or  innovations,  and  gamma,  which  is  part  of  the 
likelihood  ratio,  were  illustrated  for  the  following  order:  1st,  5th,  10th,  and  15th. 

Observations 

The  15th  order  PARCOR  coefficient  K[15]  converges  to  almost  zero  for  voiced 
sounds.  Reflection  coefficients  clearly  show  piecewise  constant  behavior. 
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6.2.2(b)  Global  Behaviour  of  Reflection.  Coefficients 
In  this  set  of  simulations,  speech  data  sampled  at  8000  samples  per  second  was 
used.  The  sentence  is  "the  pipe  began  to  rust  while  new”,  female  voice.  In  this  set 
of  simulations,  the  algorithm  was  set  to  track  starting  at  the  first  non-zero  input 
and  was  stopped  at  the  last  input  sample.  The  various  illustrations  described  below 
are  just  exerpts  from  the  entire  run  in  order  to  demonstrate  some  of  the  more 
interesting  features.The  tracking  time  constant  was  set  at  160  samples,  i.e. 
equivalent  to  two  frame  widths. 


Formant  Transition 

The  first  set  of  illustrations  show  the  behaviors  of  the  reflection  coefficients  up 
to  to  13th  order  during  a formant  transition  which  occurs  at  about  sample  6150. 
Observe  the  variations  of  the  reflection  coefficients  during  the  change. 


Phoneme  String 

In  this  set  of  illustrations,  reflection  coefficients  of  the  entire  word  "pipe"  are 
displayed,  showing  the  changes  in  reflection  coefficients  during  the  entire  passage. 


5.9 


Original  sentence:  "the  pipe  began  to  rust  while  new", 
female  speaker,  sampling  rate:  8,000Hz. 

"i 
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5.2.2(c)  Pitch  Detection  Scheme 

The  first  set  of  illustrations  decribes  a pitch  detection  scheme  that  is  based  on 
the  likelihood  ratio  parameters,  namely  Y (gamma  parameters). 

The  second  set  shows  the  results  of  using  a different  implementation  of  the  log 
likelihood  ratio  in  detecting  the  pitch  of  a different  segment  of  voiced  speech, 
namely  a)  Yp  only,  b)  Yp  plus  /n-fl*p  part  and  c)  the  true  likelihood  ratio, 
indicating  the  progressive  contributions  of  the  In  part.  (c.f.  likelihood  formulas 
above.) 


5.10 


6.2.2(d)  Global  Behaviour  of  Log  likelihood  function. 

This  illustration  shows  the  global  picture  of  the  log  likelihood  ratio  over  the 
whole  sentence,  and  its  potential  for  segmenting  speech. 


6.  Conclusions 


The  results  of  the  initial  testing  of  our  new  speech  modeling  algorithms  have 
been  very  promising.  These  algorithms  seem  to  be  very  well  suited  for  speech 
modeling  as  indicated  by  the  "constant  parameter"  behavior  during  the  pitch 
period,  good  tracking  capability  and  a novel  likelihood  ratio  based  pitch  detection 
technique  associated  with  the  ladder  form.  Their  performance  is  very  promising 
both  in  terms  of  achievable  speech  quality  and  potential  for  data  rate  reduction. 

Due  to  the  short  term  and  limited  scope  of  this  research  effort,  our  results  can 
only  be  considered  preliminary.  Further  research  and  testing  is  needed,  particularly 
in  the  area  of  pole-zero  (ARMA)  modeling  and  parameter  quantization 
(coding/decoding)  methods.  The  pitch  detection  problem  (not  part  of  the  scope  of 
this  project)  also  requires  further  study  and  refinements,  especially  in  the  context 
of  mixed  processes. 

It  is  expected  that  improved  system  performance  can  result  by  pursuing  our 
approach  further  and  by  performing  more  extensive  testing  to  "tune"  the 
transmission  system.  Real-time  special-purpose  processor  hardware  would  be 
needed  to  effectively  test  various  algorithms  that  have  shown  promise  with  a 
larger  data  base.  The  current  development  and  simulations  are  clearly  processor 
limited  if  done  on  a general  purpose  time-sharing  system. 

On  a more  general  level,  a number  of  important  research  problems  are 
incompletely  resolved  at  this  point  and  need  further  study.  They  are  not  just  of 
interest  to  speech  modeling  since  they  include  basic  pertaining  to  representations 
of  mixed  processes  such  as  Gaussian  and  poisson-type  jump  processes  (potentially 
useful  for  modeling  voiced  speech).  Good  performance  measures  for  speech 
transmission  systems,  such  as  acceptable  distortion  measures,  and  the  establishment 
of  the  basic  limits  of  speech  compression,  l.e.  the  distortion  rate  function  of  speech 
processes,  are  needed  for  further  investigation  in  speech  modeling. 
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APPENDIX  A 


A Classification  of  Algorithms 

for 

ARMA  Models  and  Ladder  Realizations  * 


M.  Morf,  D.  T.  Lee,  J.  R.  Nickolls  and  A.  Vieira 

Information.  Systems  Laboratory 
Stanford  University,  Stanford,  CA  94305 


Abstract 

Applications  of  linear  systems  modeling  have  recently  developed  quite  rapidly  in 
speech  modeling,  seismic  data  processing,  and  other  areas.  Due  to  the  diversity  of 
these  developments,  there  exists  a plethora  of  methods  for  estimating  the 
parameters  of  linear  models  given  input-output  data,  transfer  functions,  or 
covariance  functions.  This  paper  attempts  a systematic  classification  of  existing 
least-squares  modeling  methods.  Within  this  framework,  we  shall  point  out  some 
recently  developed  algorithms  that  have  many  computational  advantages  over 
existing  ones. 

In  particular,  the  methods  of  interest  will  be  classified  according  to  how  the 
input/output  data  is  acessed  and  according  to  its  type.  Data  can  be  accessed  either 
sequentially  or  in  blocks;  the  data  can  be  either  input/output  signals,  transfer 
functions,  or  covariance  functions.  Since  we  consider  state-space, 
autoregressive-moving  average  models,  and  the  related  ladder  realizations,  we  shall 
distinguish  the  following  three  classes  of  algorithms:  Riccati  or  square-root  type 
methods,  recently  developed  "fast"  algorithms,  and  their  ladder  forms.  While  the 

first  class  typically  requires  computations  of  0(n3)  or  0(n2)  with  n equal  to  the 
number  of  model  parameters,  the  "fast"  forms  only  require  operations  and  storage 
of  O(n).  The  ladder  realizations  have  several  advantages,  such  as  lowest 
computational  complexity  and  their  stability  "by  inspection"  properties. 

In  the  appendices,  we  present  an  example  of  our  new  exact  least-squares 
recursions  for  ladder  forms,  and  show  how  to  obtain  stable  partial  mini™*! 
realizations  of  the  joint  impulse  response  - and  covariance  - matching  type. 


This  work  was  supported  by  the  Defense  Communications  Agency  under  contract 
DC/1100-77-C-0Q05;  the  National  Science  Foundation  under  Contract  NSF  £n*75-189S2;  the  Air 
Force  Office  of  Scientific  Research,  Air  Force  System*  Command  under  Contract 
AF 44-620-74-C-0068;  the  Joint  Services  Electronics  Program  under  Contract  /V00014-75-C-0601; 
the  I nsitiute  of  Biomedical  Technology  in  Zurich  Switzerland-,  and  by  ARP  A through  the  use 
of  the  Stanford  Artificial  Intelligence  Laboratory  facilities. 
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I.  Introduction 


The  long  history  and  widespread  use  of  linear  modeling  [K-S74]  has  resulted  in 
many  independently  developed  methods  for  determining  such  models.  We  attempt 
a classification  of  exact  least -squares  procedures  that  are  recursive  and  optimal  in 
some  sense,  and  discuss  some  recently  developed  methods  that  have  computational 
and  structural  advantages  over  existing  ones.  We  shall  only  indicate  examples  of 
the  much  larger  class  of  suboptimal  or  approximate  methods. 

In  Section  II  we  introduce  the  modeling  problem  by  reviewing  external  and 
internal  (linear)  models,  and  consider  the  different  types  of  observed  data.  We 
then  introduce  a systematic  classification  in  tableau  form  of  the  various  methods 
to  be  discussed.  It  should  be  stressed  here  that  these  least-squares  modeling 
methods  can  in  general  only  determine  the  unique  innovations  representation 
model  [K-S74].  The  parameters  of  this  model  are  chosen  to  produce  behavior 
statistically  eqtii valent  to  the  observed  data. 

In  Section  III  we  consider  batch  methods  that  are  best  suited  for  cases  where  data 
is  accessed  in  blocks.  This  situation  typically  occurs  when  the  data  is  in  the  form 
of  a covariance  function  or  transfer  function. 

In  Section  IV  recursive  (in  time)  algorithms  that  access  the  input/output  signals 
sequentially  are  discussed.  In  the  control  context  they  are  considered  on-line 
methods  [AE],  [MKL].  In  both  sections  III  and  IV  we  point  out  the  fast  versions 
which  take  advantage  of  certain  matrix  properties. 

In  Section  V the  ladder  (or  lattice)  type  realizations  of  the  fast  algorithms 
discussed  in  Sections  III  and  IV  are  introduced  [IS],  [Mo].  These  new  methods  have 
several  nice  properties  from  both  the  theoretical  and  application  viewpoints. 

In  Appendix  A we  show  how  to  obtain  stable  partial  minimal  realizations  of  the 
joint  impulse  response-  and  covariance- matching  type.  In  Appendix  B,  we  present 
an  example  of  our  new  exact  least-squares  recursions  for  ladder  forms. 


II.  The  Modeling  Problem 


The  many  different  types  of  linear  models  can  be  classified  into  "external"  or  ’ 
input/output  descriptions,  and  "internal"  or  state-space  descriptions.  We  will  first 
consider  "external"  descriptions,  which  are  sometimes  referred  to  as  transfer 
function  type  models. 

Let  us  consider  a finite-dimensional  linear  system  (FDLS)  with  inputs  { u(«) } and 
outputs  { y(‘) }.  The  outputs  represent  sampled  data,  such  as  speech  where  y's  are 
scalars,  or  seismic  signals  from  a geophone  array  where  y’s  are  r-vectors.  The 
input-output  relationship  can  be  described  by  an  autoregressive  - moving  average 
(AHMA)  model, 

+ <V»-1  + •'  • • + Vi-n  - . <2.  la) 

®j  - fy)u» + •••  *bqui^  • ibO,  nxjiO,  (2.1b) 

where  { w(-) } is  a moving  average  of  a white  noise  sequence  { u(*) } and  the  values 

{?-!•  • • • • J-*}  and  { «-i u-q  1 initial  conditions.  The  modeling 

problem  here  is  to  determine  the  model  parameters  ai  and  bi  . Applying  the 
z-transform 

oo 

iM  - £ Ji  (2.2) 

>-0 

to  equation  (2.1)  yields 


a(z)y(z)  « b(z)u(z)  + [ terms  involving  the  initial  conditions  } , (2.3a) 

«(z)  - Z*  + ajs*-1  + ...  + an  , (2.3b) 

Kz)  - bQZn  + ftjz""1  + . . . + t>qzn-*  . (2.3c) 

With  zero  initial  conditions  and  scalar  processes,  the  ratio  of  y{z)  and  u(z)  gives  the 
transfer  function  7*(z)  - b(z)la(z ) . When  6<z)  - z*  , * z 0 then  { tu(-) } is  a white 
noise  sequence  and  { jK*) } is  called  an  autoregressive  (AR)  process;  the  model  is 
referred  to  as  all- pole.  Alternatively,  when  a(z)  - z",  { y(>) } is  a moving  average 
(MA)  process  and  an  all-zero  model  is  obtained. 


Turning  to  "internal"  models  of  the  FDLS,  we  can  consider  { y{‘) } as  being 
generated  from  { u(<) } with  a suitable  initial  condition  {x0}  by  the  state-space  model 


* * *i  + r ui*l  * 

y-  - H xi  , i i 0 . (2.4) 

This  model  can  be  chosen  to  have  the  given  transfer  function 

T(z)  - H(zI-*)-*zr  - ^ . (2.5) 

Note  that  for  convenience  we  have  used  a model  driven  by  ui+1,  rather  than  the 
more  commonly  used  model  driven  by  [MKD]  since  they  can  be  related  [Mo]. 

Given  the  transfer  function,  a simple  way  to  choose  the  matrices  { H,  4,  T } is 
the  "observer  canonical"  form 

H - #1T  , T - [bq\  oT)T  , (2.6) 

where  T denotes  transpose,  a1;n/0A-  [aj,  . . .,  «B)T,  a-  [60 Z is  the 

"delay  matrix":  Z^.  /Oa-  if  (j-i  - 1)  then  1 else  0 , and  e±  /Da-  [1,0,  . . . , 0]T  is  the  * 
first  unit  vector.  The  state-space  model  provides  a convenient  way  of  computing 
the  covariance  function  of  the  output  process.  Even  though  the  underlying  model 
{ H,  $,  r } or  { } i s time-invariant,  the  output  process  (y(>) } is  in  general 

not  stationary  due  to  "transients"  caused  by  the  initial  conditions.  However,  if  * is 
a stability  matrix  where  all  eigenvalues  have  magnitude  less  than  one,  then  as  i -*• 
the  transients  eventually  die  out  and  the  process  becomes  stationary.  In  the 
stationary  case,  the  covariance  is  a function  of  |i-j|  given  by 
Ry{i,j)  - H II  HT  , where  II  is  the  state  covariance  matrix  as  (see 
[DKM]). 

ARMA  models  and  state-space  descriptions  are  just  two  different  methods  of 
representing  the  input-output  relations  of  a FDLS,  and  they  can  be  closely  related  to 
each  other  using  matrix  fraction  descriptions  (MFD's)  [DKM].  A lesser-known  class 
of  FDLS  representations  are  the  ladder  realizations,  which  are  discussed  in  section 
V and  are  also  related  to  the  ARMA  and  state-space  models. 
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In  modeling  a process  as  a FDLS  driven  by  white  noise,  the  observed  data  is 
usually  available  in  one  or  more  of  the  following  forms.- 

a)  input-output  pairs  { u(-),  y(‘) }; 

b)  impulse  response  or  related  sequences  such  as  moments,  (or  moment  estimates, 
e.g.  obtained  from  input-output  pairs); 

c)  covariance  functions  or  second  order  knowledge  of  inputs  and/or  outputs,  (or 
estimates,  e.g.  from  the  impulse  response). 

Batch  methods  are  used  when  data  is  accessed  in  blocks,  as  in  b)  and  c);  efficient 
methods  for  determining  model  parameters  are  recursive  in  order.  .Recursive  (in 
time)  methods  are  appropriate  when  data  is  accessed  sequentially,  as  in  a).  Model 
parameters  can  then  be  estimated  recursively  both  in  order  and  time.  Table  1 
illustrates  the  modeling  methods  that  we  will  discuss;  they  are  divided  first  into 
batch  and  recursive  groups,  then  by  model  class:  Rlccati  or  square-root  methods, 
"fast"  algorithms,  and  their  ladder  forms.  Within  each  class  the  name  or  code  for 
each  method  appears  along  with  some  pertinent  references. 


III.  Batch  Methods 

When  observed  data  is  available  in  blocks,  batch  modeling  methods  are 
convenient.  We  will  first  consider  AR  models  because  of  their  widespread  use 
[Makh].  The  r developed  "linear  predictive  coding”  (LPC)  speech 

compression  schem  example,  are  a direct  application  of  least-squares  fitting 

of  AR  models  [AH],  [IS],  [Wak]  (for  a survey  see  [MG]).  AR  models  have  also  been 
very  useful  in  statistics  [Par],  [BJ],  spectral  analysis  [UB],  [Aka],  and  multichannel 
geophysical  applications  [Hob],  [WR], 

Normal  Equations 

It  is  well  known  in  least-squares  problems  that  the  parameters  of  an  AR  model 
satisfy  a set  of  linear  equations  called  the  normal  equations  (see  [K-S74],  [MG])  i 

Bn  Rn  “ I1*-4! -«B]  Rn  " 0]  . (3.1) 

An  alternate  form  is  the  Yule- Walker  equation  [Par]  : 

al:«T  *„-l  m[al a J R*-l  " Irl rnJ  • <3-2> 

In  both  forms  R is  a covariance  matrix  and  the  a^'s  are  the  "predictor”  or  AR  model 
parameters.  Rf  is  the  "prediction  error"  or  innovations  covariance,  a 
non-increasing  function  of  n (typically  the  model  order).  In  speech  processing,  two 
popular  methods  of  obtaining  the  normal  equation  are  the  "autocorrelation"  method 
and  the  "covariance"  method  [MG],  but  there  exist  many  ways  of  estimating  the 
covariance  Rn  [BJ],  [MDKV],  [Di],  General  methods  for  solving  such  linear 
equations  include  Gaussian  elimination  (GE),  Cholesky  decomposition,  Householder 
transforms  [Hou],  [GGMS];  however,  they  all  require  computations  of  Oin\ 

The  Levinson- Wiggins -Robinson  (LWR)  Algorithm 

An  algorithm  that  requires  only  0(n2)  computations  for  the  recursive  solution  of 
normal  equations  with  Toplitz  covariance  matrices  (corresponding  to  an 
assumption  of  stationarity  of  the  process)  was  first  described  by  Levinson  [Lev] 
and  later  extended  by  Wiggins  and  Robinson  [WR],  By  making  use  of  certain 
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shift- invariance  properties  of  Toplitz  matrices  (the  i,j- th  entries  are  only  a 
function  of  i-j ),  this  algorithm  solves  the  normal  equation  via  a set  of  recursions 
that  update  the  AR  parameters  or  the  predictor  parameters  in  increasing  order 
[Rob],  [K-S74].  The  Levinson  recursions  are  also  closely  related  to  the  orthogonal 
Szego  polynomials  [Sze],  [GS],  [KVM].  Levinson's  algorithm  plays  a> central  role 
because  it  can  be  generalized  to  handle  multi-channel  data  [WR], 
multi-dimensional  or  image  processing  problems  [LKM],  nonstationary  processes 
with  "shift-low-rank.”  [Mo],  [FMKL],  ladder  realizations  [MV],  [MVK]  and  ARMA 
or  state-space  models  [MKD],  [MKL],  [DKM]. 

ARMA  Models 

In  state-space  terms,  the  problem  is  to  find  a triple  { H,  $,  T } such  that 
rg*  - H4T,  where  { Ti } is  a given  set  of  "first  order"  data  characterizing  the 
impulse  response  of  a linear  system.  This  is  the  partial  realization  problem  [KFA], 
[DMK].  The  central  role  in  this  realization  theory  is  played  by  the  Hankel  matrix 
with  entries  ■ T . The  columns  or  rows  of  this  matrix  are  known  to  span 
the  state-space,  so  any  method  for  finding  a basis  is  a viable  realization  method.  Of 
particular  interest  are  methods  for  finding  the  smallest  basis  resulting  in  minimal 
order  n realizations  [HK],  [Si],  [YT];  they  all  require  0(n3)  operations. 

From  a transfer  function  polnt-of-view,  the  partial  minimal  realization  problem 
is  that  of  finding  two  relatively  prime  (matrix)  polynomials  a(z)  and  b(z)  such  that 
the  given  power  series  T(z)  matches  say  k terms  of  the  expansion  of  b(z)la(z)  . This  is 
the  classical  Pade  approximation  problem,  which  in  the  scalar  case  yields 

T(z)  a(z)  - b(z)  + [ terms  in  z~l,  i > k-n  } . Equating  coefficients  of  z’*,  0 s i S n , 

/ 

we  get 

“ ®n  • or  al^  " “ ^*n*l ^ In F • (3.3) 

where  ZTn  is  a Toplitz  matrix  containing  the  reversed  column  ordered  Hankel- 
matrix  Hn  . Note  the  similarities  here  to  the  normal  equation  (3.1)  and  to  PTony*s 
method  [MG]. 


Again,  standard  methods  could  be  used  to  solve  for  a„,  but  they  all  require 
computations  of  0(n3).  However,  if  one  takes  advantage  of  the  structure  of  the 
Hankel  or  Toplitz  matrices,  fast  algorithms  can  be  found.  Such  algorithms  have 
been  developed  (in  a coding  theory  context)  by  Berlekamp  [Be]  and  for  minimal 
realization  by  Massey  [Mass].  Multivariable  versions  have  also  been  developed 
[Mo],  [DMK].  These  recursions  are  strikingly  similar  to  Levinson's  recursions;  the 
Berlekamp-Massey  algorithm  can  also  be  related  to  orthogonal  Lanczos  polynomials 
[Lane],  [Mo].  An  alternative  method  for  obtaining  stable  partial  minimal 
realizations  is  discussed  in  Appendix  A It  can  also  be  derived  by  considering  a 
Gram-Schmidt  (GS)  orthogonalization  on  the  columns  of  the  Hankel  matrix  HR  or 
the  Toplitz  matrix  &n  [Mo],  or  more  generally  by  using  projection  methods 
[KKM]. 

Spectral  Factorization  and  Innovations  Representations 
The  problem  of  obtaining  a model  of  a process  {?(•)},  given  its  covariance 
function  or  second  order  information,  is  called  stochastic  realization.  We  are 
interested  in  representing  { y(-) } as  the  output  of  a linear  model  driven  by  white 
noise.  In  geheral,  there  exist  many  such  models,  however  the  only  stable  and 
stably  invertible  model  is  the  (unique)  innovations  representation  (IR)  [K-S74]. 
The  inverse  model  is  the  whitening  filter  that  produces  a white  noise  process,  the 
Innovations  { <(*) },  when  driven  by  the  observed  data.  In  discrete- time  or  time 
series  analysis,  the  innovations  are  the  one-step  prediction  errors  of  the 
observations.  If  the  process  { /•) } is  stationary,  the  problem  of  obtaining  the  IR 
essentially  reduces  to  one  of  spectral  factorization. 

An  efficient  method  for  obtaining  the  spectral  factors  of  Sy(z) 

Sy(z)  - Kz)  K-z)  / a(z)  a(-z)  , > (3.4) 

is  given  as  a two-step  procedure  in  [DKM],  [MKD],  [Mo].  In  the  stationary  and 
scalar  process  case  considered  here,  the  truncated  or  one-sided  power  spectrum  Sjz) 
of  { yO  } is  formed  from  the  covariance  sequence.  A minimal  realization  algorithm 
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is  then  used  to  approximate  St(z)  by  q(z)la(z) , where  l/a(z)  is  the  AR-paxt  of  the 
desired  model.  From 

Sw{z)  - a(z)  Sy[z)  a(z~l)  - a{z)q(z~[)  + q(z)a(zA) 

- 6<z)  Hz'1)  (3.5) 

it  can  be  seen  that  the  spectral  factorization  problem  for  { y{>) } is  now  reduced  to 
the  simpler  factorization  problem  for  a MA-process  { w(-) } , where  the  factor  b(z)  is 
the  MA-part  of  the  desired  model.  It  can  be  obtained  by  Cholesky  and  other 
factorization  procedures.  Thus  { y{-) } is  modeled  by  the  cascade  of  the  AR  and  MA 
parts  driven  by  the  innovations,  a white  noise. 

In  the  time-domain  this  corresponds  to  a factorization  of  the  (stationary) 
covariance  (a  Toplitz  matrix)  into  triangular  factors.  The  "fast-Cholesky" 
factorization  given  by  [Mo]  is  an  efficient  algorithm  for  stationary  and 
non-stationary  covariance  matrices  with  "shift-low-rank".  It  should  be  noted  that 
many  popular  covariance  estimates  have  this  property. 


IV.  Recursive  "in  Time"  Methods 


When  the  observed  data  is  available  as  input-output  pairs  that  must  be  accessed 
sequentially,  recursive  modeling  methods  are  the  most  appropriate.  Many  recursive 
least-squares  methods  have  been  developed  in  the  identification  and  control  area; 
they  typically  involve  solving  Riccati-type  equations  and  have  computation  and 
storage  requirements  of  0(n^)  and  0{n“)  respectively.  Recently  "fast"  algorithms 
have  been  developed  with  reduced  computations  and  storage  of  O(n)  using  ARMA  or 
ladder  realizations . 

An  important  set  of  least-squares  recursive  methods  for  AR-type  models  is 
described  in  detail  in  [SLG]  and  more  recently  in  [MKL],  The  discussion  includes 
' least-squares  (LS),  weighted  least-squares  (WLS),  generalized  least-squares  (GLS), 
instrumental  variable  (IV)  and  recursive  maximum-likelihood  (RML1.2)  methods 
for  ARMA  models.  All  these  methods  solve  a Riccati  equation  that  recursively 
updates  the  inverse  of  the  matrix  appearing  in  the  normal  equation  of  the  problem. 

An  alternative  to  the  Riccati  equations  are  the  square-root  forms  discussed  for 
instance  in  [MK].  They  make  use  of  the  numerically  preferrable  orthogonal 
transformations  [Hous],  [GGMS]. 

A special  case  of  the  IV  method  is  obtained  by  using  the  n-step  delayed  outputs  as 
instrumental  variables.  This  can  be  shown  to  be  equivalent  to  a minimal  realization 
problem  given  (estimated)  covariances  R . Recall  that  in  the  given  (estimated) 
covariance  case  we  discussed  a two  step  procedure.  The  first  step  was  to  obtain  a 
minimal  realization,  or  rational  approximation  of  S+(z)  by  q(z)la(z),  or  in  the 
time-domain  of  R 4 by  Q A*1  [DKM],  [MKD].  In  matrix  notation  we  obtain 

R,  A m Q , . (4.1) 

where  A and  Q are  banded  matrices  of  "band  width"  n , if  the  underlying  linear 
model  is  of  that  minimal  order.  The  first  column  of  (4.1)  corresponds  to  equation 
(3.3) 

K \ ■ B„a,  - 0.  • 
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where  JS|(  is  the  (2,1)  block  entry  of  the  appropriately  partitioned  triangular 
Toplitz  matrix  . The  matrix  7?n  is  the  cross-covariance  of  the  last  n observations 
and.  the  same  set  n time-steps  delayed.  clearly  plays  here  the  role  of  the 
reversed  ordered  Hankel  matrix  JTb  . By  noting  that  the  Rlccati-type  equation  caa 
be  interpreted  as  a recursion  for  (low  rank)  updating  of  the  inverse  of  a matrix,  we 
see  that  the  IV  method  can  be  viewed  as  a recursive  (in  time)  updating  procedure 
for  the  minimal  realization  solution  for  a in  equation  (3.3). 

Fast  Algorithms  for  Recursive  "in,  Time"  Methods 

In  [MKL]  the  development  of  "fast"  algorithms  for  the  recursive  least-squares 
methods  is  discussed  in  detail.  Efficient  recursions  for  time  and/or  order  updates 
for  AR-type  models  were  first  derived  in  [Mo].  The  basic  idea  there  was  the 
observation  that  the  matrices  encountered  in  many  least-squares  problems  have  a * 
"shift  invariance"  or  a "shift-low-rank"  property.  It  can  be  characterized  by  the 
(low)  rank  p of  the  shifted  difference  of  a matrix  M:  p [ M ” ZT  M 1 ] , where 
the  "delay"  matrix  Z was  defined  in  Section  II.  This  property  is  generated  by  the 
fact  that  these  matrices  are  sums  of  products  of  Toplitz  or  Hankel  matrices.  It  ean 
also  be  used  to  obtain  fast  Cholesky  algorithms  for  MA  processes,  thus  obtaining 
recursive  whitening  filters  of  the  AH  type  ( e.g.  RMH5  algorithm  in  [Mo]). 

Similarly  we  can  obtain  general  LWR  recursions  in  order  and  time  for  AR 
processes,  l.e.  updating  the  MA  prediction  (whitening  filter)  parameters  a-.  A 
surprising  feature  of  the  fixed-order  recursive-in-time  algorithms  is  that  explicit 
updating  of  the  covariance  estimate  is  unnecessary,  basically  because  the  model 
parameters  are  an  implicit  characterization  of  the  covariance.  Since  the  details  of 
these  algorithms  can  be  found  in  [MKD],[DKM],[Mo],[FMKL]  , we  shall  only  give  a 
comparison  of  the  LWR- type  algorithms,  assuming  that  the  reader  is  already 
familiar  with  the  Levinson  recursions  as  described  in  [Wie],  [WR],  or  more 
recently  [K-S*M]. 

The  recursions  for  the  generalization  of  the  LRW  algorithm  for  covariance 
matrices  exhibiting  a shift  invariance  property  have  a very  similar  form  to  the 
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original  Levinson  recursions.  However,  in  contrast  to  the  two  solutions  required  in 
the  LWR  algorithm,  the  so-called  forward  and  backward  predictors,  we  require  in 
general  more  solutions  for  non-Toplitz  matrices.  It  turns  out  that  the 
"shift-low-rank"  p of  the  covariance  matrix,  regardless  of  its  size,  is  equal  to  the 
number  of  solutions  required  in  the  recursions.  For  the  case  of  covariance 
estimates  that  can  be  written  as  products  of  two  Toplitz  matrices  (typically 
containing  input-output  data),  the  number  of  solutions  required  is  at  most  four  in 
the  scalar  case,  and  2m +2  for  m -channel  data. 

For  combined  ARMA  models  we  can  either  attempt  to  model  first  the  AR  or  the 
MA  part  and  then  try  to  estimate  the  remaining  part  of  the  model.  In  the  batch 
methods  of  Section  III  we  discussed  ways  of  obtaining  the  AR  part  first  via  minimal 
realization  and  then  the  MA  part  via  spectral  factorization.  The  other  order  of  first 
obtaining  the  MA  part  could  be  obtained  by  working  with  (an  estimate  of)  the 
inverse  of  the  covariance  matrix,  the  so-called  information  matrix  [MK]. 

The  cascaded  approach  can  be.  carried  out  also  in  time  recursive  form  by 
estimating  the  AR  part  via  a (fast)  recursive  form  of  the  IV  method,  as  discussed 
above,  cascaded  by  the  fast  Cholesky  recursions  for  a MA  process  (eg.  RMH5  in 
[Mo]).  The  only  difficulty  now  is  that  both  parts  estimate  the  models  in  the 
so-called  controller  or  "tapped  delay  line"  realization,  a dual  form  to  the  observer 
form,  which  cannot  be  merged  by  inspection. 

The  joint  innovations  representation  approach  discussed  in  Section  III  and 
Appendix  A is  ideally  suited  for  recursive  in  time  methods.  Even  though  the 
driving  inputs  (conceptually  the  innovations)  are  usually  not  available,  they  can  be 
replaced  with  their  best  estimates  obtained  by  using  the  best  current  parameter 
estimates.  This  is  clearly  only  possible  for  methods  with  sequential  data  access.  It 
turns  out  that  this  seemingly  "suboptimal"  approach  of  substitution  has  itself 
optimality  properties  (see  eg.  [SLG]);  a similar  situation  occurs  in  detection  of . 
unknown  signals,  and  in  the  famous  separation  result  of  linear  quadratic  control 
using  state  estimates  [KFA].  The  recursive  maximum  likelihood  methods  in  [SLG] 
and  [MKL]  can  be  derived  from  such  an  approach.  Once  estimates  of  current 
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In  Section  II  we  discussed  the  realization  of  a given  transfer  function  T(z)  - 
b{z)ja{z)  via  transfer  function  or  state-space  type  models  such  as  ARMA,  controller, 
or  observer  canonical  forms.  If  the  roots  of  a(z)  are  known,  T{z)  can  be  represented 
by  a partial  fraction  expansion.  Using  polynomial  evaluation  formulas  we  can 
obtain  the  so  called  Jordan  canonical  or  parallel  decomposition  form  (even  for 
matrix  transfer  functions)  [Mo].  The  Jordan  form  has  the  nice  property  tliat 
stability  can  be  checked  by  merely  inspecting  the  magnitude  of  the  roots  . Finding 
the  roots,  however,  is  numerically  sensitive. 

The  ladder  (or  lattice)  canonical  realizations  provide  a very  promising 
alternative.  They  also  have  the  property  that  stability  and  even  minimum-phase 
can  be  checked  by  inspecting  the  PARCOR  or  reflection  coefficients  [IS],  [Wak], 
[MG],  [Mo],  [Cla2].  In  contrast  to  the  methods  for  finding  roots  of  polynomials  this 
requires  only  a finite  algorithm,  the  Schur-Cohn  test  for  stability.  This  is  actually 
equivalent  to  the  Levinson  or  orthogonal  Szego  polynomial  recursions  performed  in 
decreasing  order  on  b.  or  a^z) , [K-S74],  Given  the  stationary  covariance  matrix  R , 
i.e.  second  order  information,  the  a^'s  and  the  reflection  coefficients  can  be 
computed  via  the  LWR  algorithm.  From  a stochastic  process  point  of  view  we 
identify  these  coefficients  with  the  partial  correlation  (PARCOR)  coefficients  or 
singular  values.  They  also  have  physical  significance  in  the  scattering  theory  of 
waves  [IS],  [Wak],  [K-S74],  [MV],  [MVK]. 

The  Levinson  algorithm  can  actually  be  carried  out  using  only  the  reflection 
coefficients  as  parametrization,  since  the  inner  product  k.  of  a vector 
r /Da-  [rj , . . . , rt0T  and  the  coefficient  vector  a._j  can  be  obtained  as  the  current 
output  ki  of  a ladder  realization  driven  by  a previous  input  sequence  containing 

{rj rj  . Similarly  we  can  translate  other  algorithms  for  AR  models,  such  as  the 

various  generalized  Levinson  algorithms  [Mo],  [DKM],  [LKM],  Appendix  A,  into 
their  ladder  form  equivalents  as  in  [MVK],  [MV],  Appendix  B.  These  forms  are  of 
interest  by  virtue  of  their  stability  properties  and  their  numerical  robustness  ~ 
they  typically  require  sample  correlation  operations.  These  forms  have  also 
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canonical  [Mo]  and  invariance  properties  [MVK],  as  well  as  minimal  storage 
requirements  for  modeling  algorithms,  as  seen  from  a comparison  of  the  the 
recursions  for  the  PARCOR  and  the  a.  parameters  in  Appendix  B. 

ARMA  models 

Recall  that  the  first  step  of  realizing  an  ARMA  model  in  Section  m was  a minimal 
realization  problem.  The  solution  to  this  MR  problem  can  be  carried  out  in  ladder 
form  by  using  a Berlekamp  Massey  (BM)  - type  algorithm.  These  recursions  are 
actually  very  similar  to  the  LWR  recursions,  as  noted  in  [Mo];  therefore  we  can  use 

t 

an  analogous  derivation  to  obtain  ladder  forms  for  the  BM  recursions,  as  presented 
in  [GrMo].  m * 

Alternatively,  the  joint  IR  approach  explored  in  Appendix  A,  leads  (even  for 
scalar)  processes  to  the  theory  of  multichannel  ladder  realizations  of  the  AR  type 
discussed  above.  Since  we  embedded  the  underlying  ARMA  model  in  a two  channel 
AR  model,  the  IR  model  will  again  be  of  order  2 n , i.e.  non-minim al.  This  would  also 
hold  for  a ladder  realization. 

Minimal  models  can  be  obtained  by  merging  the  AR  and  MA  parts  In  the  observe; 
form.  It  is  also  possible  to  obtain  a minimal  rational  ladder  form  [MG],  [Mo],  The 
basic  idea  in  state-space  terms  is  to  add  a suitable  input  matrix  (I*)  or  output  matrix  ' 

(H)  to  a ladder  form  realization  of  the  AR  part  of  the  model;  this  is  possible  since 
the  ladder  forms  are  controllable  ( or  their  dual  observable  ) [Mo]. 

The  second  step  of  the  stochastic  realization  procedure  of  Section  in  requires  a 
spectral  factorization  for  the  determination  of  the  MA  part.  As  indicated,  we  need  _ 
to  determine  the  triangular  factors  of  the  (banded)  covariance  matrix  of  the  MA 
process.  They  can  be  obtained  from  the  Cholesky  factors  or  the  RHS  of  the  LWR 
recursions.  Similarly  it  is  possible  to  obtain  the  ladder  realizations  of  the  MA  part, 
since  the  fast  Cholesky  recursions  Hby  rows"  have  the  form  of  a state-space 
equation  with  a dynamic  matrix  $ that  has  precisely  the  same  form  as  the  $ matrix 
of  a feedback  ladder  form  In  state-space  notation  [Mo].  As  for  the  LWR  recursions, 
there  similarly  exists  a ladder  form  of  the  fast  Cholesky  algorithm  that  requires 
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only  reflection  coefficients  as  parametrization. 

i 

Ladder  Forms  for  Recursive  "in  Time”  Methods 
The  ladder  forms  for  exact  least-squares  solution  to  AR  modeling  have  been 
developed  in  [MV].  In  Appendix  B,  we  shall  present  the  simplest  one  of  the  many 
possibilities  of  such  ladder  forms,  the  "prewindowing"  case  [MDKV].  It  is 
interesting  to  note  that  the  partial  correlation  coefficients  are  computed  as  sample 
cross  correlations  between  the  "forward"  and  "backward”  prediction  errors  as 
expected  from  the  stochastic  derivations  of  the  ladder  forms  [Wak],  [Mo],  [5KM]. 

The  ladder  forms  of  the  GLS  and  RML1/2  methods  discussed  in  [SLG],  [MKL]  and 
Section  IV  can  be  obtained  by  embedding  the  models  in  an  appropriately  augmented 
AR  model  as  in  Appendix  A . The  IV  method  led  to  nonsymmetric  Riccati  equations, 
therefore  the  fast  versions  also  require  a nonsymmetric  form  of  the  LWR 
recursions.  However  it  is  clear  that  these  recursions  are  then  of  the  BM  type  since 
this  algorithm  also  works  with  nonsymmetric  (though  triangular)  Toplitz 
matrices.  Therefore,  we  could  obtain  "nonsymmetric"  ladder  forms  of  the  type 
given  in  [GrMo].  Although  the  final  algorithms  of  these  ladder  forms  are  simple  to 
implement,  the  exposure  of  the  "shift-invariance"  is  in  general  nontrivial  [MV], 
Our  preliminary  experience  with  the  numerical  properties  of  these  algorithms 
has  been  very  encouraging;  in  general  ladder  realizations  are  superior,  to  direct 
forms  for  computing  estimates  of  the  coefficients  of  c(z)  and  6(z)  . Stochastic 
approximation  or  gradient  type  methods  using  ladder  forms  can  be  obtained  easily, 
e.g.  [SV];  however,  they  have  drawbacks  similar  to  other  stochastic  approximation 
methods,  especially  for  covariance  matrices  with  extreme  eigenvalue  distributions. 
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Table  It  Classification  of  Least-Squares  Modeling  Methods 


Cho)c«ky,RE,3Qi  [OK],5oc.IV  Feet  Chole«ky,RMII6t  5cc.IV, [Mo]  Rcc.Fecdb.Udden  5cc.1V, [Mo] 


Appendix  A:  Stable  Partial  Minimal  Realizations 
In  this  appendix  we  show  how  to  obtain  stable  partial  minimal  realizations  of 
the  Joint  impulse-response  and  covariance-matching  type.  It  will  turn  out  that  we 
! can  obtain  an  ABMA  model  by  imbedding  it  in  a two  channel  AA  modeling  problem. 

Given  an  ABMA  model  as  represented  by  the  difference  equation  of  section  II,  we 
can  rewrite  it  as  (let  q - n) 

Jt  + V«-l  + • • + *Jt-n  ~ *1  Vi  " • • • - bnut-n  " V.  • <A1) 


Deterministic  Case 

We  first  consider  the  deterministic  case  where  we  are  given  impulse  response 
data  or  the  Markov  parameters.  Writing  the  input/output  relationship  in  matrix 
notation  (see  sec.  m)  yields 
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where  Tn  is  a lower-triangular  and  is  a full  Toplitz  matrix  of  the  Markov 
parameters  (Hj  is  the  column  reverse  ordered  Hankel  matrix).  Letting  T-*m,  we  get 


the  normal  equation 
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Stochastic  Case 

From  a stochastic  process  point  of  view  we  can  express  the  normal  equation 
associated  with  the  augmented  AR  model  as 

£{[y1lcytTu,T][  a ol}  - E { Mi utb0  ut  ] } 
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We  can  solve  for  the  normal  equation  of  aR  t 

*„a»  - t«.  - r.'r.la.  . ■ «i*‘,  • CAT) 

• 

The  equations  (A5),  (A6),  and  therefore  the  non-Toplitz  equations  (A7)  (!)  can  be 
solved  recursively  with  the  LWR  algorithm.  Note  that  if  ■ 0 , the  minimal 
order  n - k.  We  could  bring  equations  (A5),  (A6)  into  a more  familiar  form  by  the 

interleaving  permutation  (1,3,5 2n- 1 , 2, 4, 6, ... , 2n+2),  cf.  [MDHV],  to  convert 

the  two-process  covariance  matrix  into  a n by  n block  Toplitz  matrix,  with  2 by  2 
blocks,  however  the  LWR  algorithm  clearly  applies  to  both  representations  with 
suitable  modifications. 

Thus  we  have  shown  that  the  Joint  impulse  response/covariance  matching  ‘ 
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problem  is  equivalent  to  solving  a set  of  normal  equations  associated  with  a two 
channel  AR  modeling  problem.  Since  the  predictor  for  the  Joint  process  is 
triangular  and  minimum  phase,  the  denominator  a„  of  the  underlying  ARMA 
model  is  also  minimum  phase  and  therefore  stable,  (for  all  k). 

Equations  (A5)  and  the  elegant  stability  proof  were  actually  first  obtained  by 
Claerbout  [Clal]  via  a least-squares  rational  approximation.  The  connections 
between  the  Joint  innovations  representation,  the  augmented  normal  equations, 
and  the  Hankel  matrix  were  pointed  out  in  [Mo]  and  also  in  [MDKV],  [MKD], 
[DKM],  where  algorithms  were  given  to  solve  equations  of  the  type  seen  in  (A6) 
and  (A7).  For  the  special  case  where  R has  a "shift-low-rank"  of  one  of  the  type 
(B5),  called  the  post-windowing  method  in  [MDKV],  a Levinson-type  algorithm 
was  given  recently  by  [MR].  The  stability  property  of  the  AR  modal  was  proved 
there  using  a somewhat  more  complicated  Lyapunov  technique. 
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Appendix  B:  LS-Recursions  for  Ladder  Forms 

The  Fre  wind  owing  Case 

Given  a series  of  observations  0<«T},  where  { y(-) } can  be  m vectors,  we 
wish  to  find  the  least-squares  one-step  predictor  of  order  p parametrized  by  the 

(matrix)  coefficients  {A  j(j),  M . We  can  define  many  different  squared 

error  criteria  Zp  j»,  for  instance  as  a function  of  t and / in 

Ep,T  • Z ^ p,T  • . 


A\t i A'p/p)l  , y'{M-p]  £ [ytJ (Bl) 

An  obvious  choice  from  an  innovations  point -of -view  is  (s-O./eD,  the 
"pre-windowing"  case  [MDKV].  If  s - p and  / - T the  so-called  "covariance" 
method  is  obtained,  and  if  s - 0 and  / - 7 ♦ p we  get  the  pre-  and  post-windowed 
case  or  the  "correlation"  method  [MG].  The  total  squared  error  can  be  expressed  as 

EP,T  " tr  ( Rp,T  Ap,T  i • Rp,T  ‘ rp,TYTp,T> 


rhT  £ [ y[<k-pl  , y[\:-p+ 1] ylT-T-pl  ] <B2) 

Thus  the  problem  of  determining  A^  by  minimizing  Z^j  leads  to  . . 

1 * vyin  . (B3) 

Although  Rpj>  is  not  Toplltz,  it  still  carries  a certain  shift-invariance  structure, 
given  by  the  following  Identities 


RpJ  " *»r-l  + rtTT-p)  ytT:T-p )T 


X X 


X R 


V ix  * 

XXX 


Define  the  backward  predictor  B^j-  and  the  smoothing  errors  C^p 


(B5) 


B\T  RpJ  * I 0 • • • . 0 . ; C\T  £ y[T :T-p]  . (B6) 

Then  the  forward  and  backward  prediction  errors  (innovations),  i^p , and  rftp , and 
an  auxiliary  scalar  1 can  be  defined  by 

t • r'»T ■ V1  4 y'{TT-P]  ‘V'V'V1' 


Order  Update  Recursions 


Using  the  three  shift-invariance  identities  for  R^ y (B5)  and  using  some 
symmetry  properties,  the  order  update  recursions  for  Apj>  , Bpy  , Cpy , Rtptj' » and 
Rrp,T  are 


A\*\J 

m 

t *pj>  0 3T 

- K\,T  R~rp,T-l  l °-  B\,T- 1 3T 

» 

*W 

m 

1 °»  Bp,T-i  3t 

- Kp,r  R~t p,T 1 ^TP)r*  0 3T 

(B7) 

cVw 

m 

O 

4 

o 

+ rVi,rB'W  bVi,t  where 

kpj 

m 

[ last  block  row  of  Rp*\j  3 t ^ pj<  0 3T 

m 

t 0,  B'pj_ j 

) [ first  block  row  of  J?p+1  T ]T. 

m 

K\>T  B~rp,T-l  Kp,T 

m 

WpJ-\ 

kp,t  r4p,t  k\,t  • 

(B8) 

The  order  update  recursions  are  very  similar  to  the  multivariate  version  of  the 
Levinson  algorithm,  and  a similar  set  of  recursions  for  time-update  can  also  be 
obtained  [MDKV],[Mo]. 

Ladder  Type  Realization 

Premultiplying  the  above  equations  by  ytTT-^+l]  , we  obtain  the  following 
order  update  recursions  for  lpy , r^y , Y^y 


Vi,r  - 

V 

- K'pj^P'T-  lrp,T-l 

W " 

rp,T-l 

- KpjR-tpjtpj 

Vu  • 

VP,T 

+ VU  • 

(B9) 

The  " Kalman  gain " Kp  T is  obtained  from  [MV]  as  follows 

KpJ*\  " KpJ  * rpJ  1 ( 1 ” Vl.T ) .(BIO) 

and  the  reflection  or  PABCOB  coefficients  are  obtained  by 

*'(  T ‘ hr r\t ■ k\t  : k\t ■ (Bit) 


1 

; I 


The  Initial  conditions  are  given  by 


for  piT : 


*0,7 

" r0,T 

“ ‘Jt 

a 

r 

r 

“ r\t 

- 2 

l-O 

m *T,T  • 

rP.T 

- rj 

*P,T 

- 

*> 

- 

i 

f-l,T  “ 0 • 


‘w»*l 


rT,T  ' 7p,T 
T,T>  Kp,T 

Jo  Wi'- 


- 7 


T,T  • 


- 0; 


As  the  dual  to  the  stochastic  forms  in  [IS],  [Wak],  [Mo],  [SKM],  equations 
(BS)-(Bl  1)  are  a complete  set  of  order  and  time  update  recursions  to  obtain  the 
exact  least-squares  ladder  farm  predictor,  which  is  shown  in  Figure  i. 


Figure  1.  Ladder  realization  of  exact  one-step  least-squares  predictor. 


The  recursion  (BIO)  computes  the  sample  cross-covariance  of  the- forward  and 
backward  innovations,  using  the  optimal  weighting  !/( 1-7., .),  compared  to  other 
suboptimal  schemes  [SV].  In  the  scalar  case  if  ^q-0,  or  in  general  if  7^_ j j-e  1 , 

since  0s7^jst  [MDKV].  If  at>|,  we  require  Tip+m.  These  singularities  he 
avoided  by  including  a priori  estimates  of  the  covariance  or  equivalently 
including  a weighted  norm  of  the  predictor  in  the  error  criteria  Several 

such  modifications  have  been  proven  useful  in  actual  implementations. 
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Abstract 

LeJder  forms  sro  probably  the  most  promising  canonical  forms  in 
estimation,  and  system  identification.  Many  recent  applications,  such  as 
in  geophysical  signal  processing,  high  resolution  pnaiimuin  entropy*) 
spectral  estimation  and  speech  encoding,  justify  the  interest  in  these 
forms.  They  appear  In  many  contests,  such  as  scattering  and  network 
theory  end  the  theory  of  orthogonal  polynomials.  The  slate-space  model 
ladder  realisations  are  scry  closely  related  to  (block)  Schwarx  matria 
canonical  forms,  which  generally  appear  in  the  contest  of  stability 
analysis.  In  fact  they  ara  the  natural  ‘stability  canonical  form*  for 
(diacrete-timc)  l.yapuoo*  equations  since  the  associated  positirc  definite 
(covariance)  matrices  are  diagonal  resp.  an  Identity.  This  fact  leads  also 
to  clnso  connections  to  squaro-root  algorithms  including  the  once  of 
Cholcshy  and  Chandrasekhar  type,  since  again  ladder  forma  are  the 
natural  eaxonlcal  forms.  In  realisation  theory  these  forms  are  obtained 
oia  orthonormal  state-space  bases  using  Cram-Schmidt  type  procedures, 
ladder  forms  base  many  ether  advantages,  such  us  lowest  computational 
complexity,  good  numerical  bcharior,  stability  'by  inspection*  proportion 
and  relations  to  physical  properties  such  as  reflection  or  partial 
correlation  coefficients,  and  perhaps  absorption  coefficients. 

Vo  shall  present  an  outline  of  some  newer  results  connecting  these 
topics  and  present  new  esamples  of  our  new  esaet  least-squares 
recursions  for  (*adapli«e*)  laJder  forms  with  poles  and  seros  We  close 
with  • few  emulation  examples,  including  the  identification  of  P layered 
media  (via  ultra  sound) 

I.  Introduction 

Ladder  forms  have  ailracicd  much  attention  recently  because 
they  are  probably  (he  most  promising  canonical  forms  in  estimation 
and  system  identification.  These  forms  have  appeared  in  many 
applications  such  as  geophysical  signal  processing  for  quite  some 
time  , and  more  recently  such  models  are  being  used  in  high 
resolution  (‘maximum  entropy*)  spectral  estimation  and  speech 
encoding.  Ladder  ( sometimes  called  lattice  - a term  we  would  like 
to  reserve  for  two  and  higher  dimensional  extensions  [LKM])  forms 
appear  in  many  contexts,  first  perhaps  in  scattering  and  network 
theorv  where  the  scattering  of  waves  in  layered  media  or  in 
(non-homogeneous)  transmission  lines  leads  very  naturally  to 
ladder  forms,  see  e«.  [CU2],  fLKF],  [RMY],  [Kellyl 

Ladder  forms  appear  explicitly  but  more  often  implicitly  in 
many  contexts.  They  are  directly  related  to  the  scattering  of 
waves  and  iherrfure  perhaps  first  introduced  in  physics.  Some  of 
the  associated  mathematics  are  used  in  network  theory,  where  the 
caxcade  sit  victors  of  t!*  Udder  forms  p'.sy.  an  tote.  The 

notion  of  transfer  functions  leads  very  naturally  to  the  next 
connection,  the  theory  of  orthogonal  polynomials.  They  in  turn  also 
appear  in  the  stability  analysis  of  linear  systems.  The  state-space 
models  that  are  related  to  orthogonal  (matrix)  polynomials  are  the 
so-called  (block)  Schwarx  matrix  canonical  forms,  see  c.g.  [AJM], 
fSSl  However,  the  special  structure  of  these  matrices  leads  very 
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directly  to  the  ladder  realisations  IMoJ.  In  fact  they  are  the 
natural  "stability  canonical  form*  (or  (discrete-time)  Lyapunov 
equations,  since  the  associated  positive  definite  (covariance) 
matrices  are  diagonal  or  respectively  an  identity  matrix.  The 
similarity  transformations  to  this  form  involve  a matrix 
square-root  of  the  associated  covariance  matrix.  The  ladder  forms 
are  therefore  closely  connected  to  square-root  algorithms 
including  the  ones  of  Cholesky  and  Chandrasekhar  type.  In 
realization  theory  these  forms  are  obtained  via  orthonormal 
state-space  bases  using  Cram-Schmidt  type  procedures,  due  to  the 
fact  that  this  ortho-normalization  is  again  related  to  matrix 
square-root  and  orthogonal  polynomials. 

Ladder  forms  have  many  other  interesting  properties.  Due  to 
the  fact  that  they  are  in  many  problems  the  "natural  canonical 
form",  they  lead  to  algorithms  with  lowest  computational 
complexity  compared  to  other  canonical  forms.  Although  a detailed 
study  is  still  outstanding,  there  are  many  indications  that  this  form 
leads  to  good  numerical  behavior  of  the  associated  algorithms,  a 
properly  that  is  not  shared  with  most  canonical  forms. 
Furthermore,  the  stability  "by  inspection"  property  given  the 
ladder  coefficients  is  shared  only  by  the  Jordan  or  modal  canonical 
form.  However,  the  latter  one  requires  the  knowledge  of  the 
eigen-values  that  are  in  general  not  very  easily  obtained,  compared 
to  the  finite  algorithm  required  to  gel  the  ladder  coefficients. 
They  in  turn  have  other  interesting  interpretations  and  relations 
to  physical  properties  such  as  reflection,  and  perhaps  absorption 
coefficients.  In  stochastic  process  modeling  and  spectral  estimation 
the  ladder  coefficients  turn  out  to  be  partial  correlation  or 
canonical  correlation  coefficients,  which  leads  to  very  simple 
methods  to  determine  these  parameters  either  from  covariance 
data  or  even  directly  from  measured  data. 

In  [MLNV]  we  presented  a classification  of  exact  least-squares 
modeling  methods.  The  material  discussed  here  is  a sequel  to  the 
results  discussed  there,  in  particular  we  will  concentrate  here  on 
the  ladder  forms  and  the  associated  algorithms.  We  shall  present 
an  outline  of  rome  newer  results  course  ling  these  topics  and 

present  new  examples  of  our  new  exact  least-squares  recursions 
for  ("adaptive")  ladder  forms  with  poles  and  zeros.  We  close  with 
a few  simulation  examples,  including  the  identification  of  » layered 
media  (via  ultra-sound). 

II.  Ladder  Realizations 

In  (MLNV]  and  (MVL)  we  discussed  various  ladder  realizations. 
We  assume  here  familiarity  with  this  material  and  would  like  to 
give  here  only  a missing  link  to  state  space  realizations,  namely  the 
fact  that  the  ladder  forms  ean  be  obtained  via  an 
ortho-normalization  of  the  stale  space.  In  this  context  it  is  welt 
known,  that  various  canonical  state  space  realization  can  be 
obtained  via  methods  that  construct  a basis  of  either  the  Hankel 
matrix  of  the  Markov  parameters,  resp.  the  impulse  response 
parameters  of  the  system,  or  bases  ef  the  controllability  or 
observability  matrices  of  the  system,  see  04.  [K-S7tl  Wo  will 
present  here  an  outline  of  the  scalar  discrete-lime  constant 
parameter  case.  For  convenience  we  use  an  intermediate  canonical 
form,  the  controller  form.  It  has  the  properly  that  the 
component  of  the  it  stale  vector  r'li)  can  he  obtained  from  the 


B1 


input  utx)  in  Z-transform  notation  via  **(«)  - a(x)z""*/o(x)  , where 
a(x)  is  the  characteristic  polynomial  The  ladder  forms  are 
obtained  in  a similar  manner  via  r,'(x)  • u (xM>'(x)/a(x) , where  4*(x) 

is  the  (dual)  ortho-normal  polynomial  on  the  unit  circle  iSivJ. 
Writing  these  facts  in  matrix  notation,  we  obtain  the  results  that 
the  the  state  at  time  n is  given  by  s„  - Cm  u[ii-1,0),  - [u;, 

—i  »j)  i where  Cp  is  the  usual  controllability  matrix.  For  the 
controller  form  it  is  now  not  too  surprising  that  C*  » 

, the  inverse  of  a unit  upper  triangular  Toeplitz 
matrix  of  the  coefficients  of  a(x) . the  Ladder  forms  on  the  other 
hand  result  in  (see  (MoJ)  Cj  - Bp  , where  B„  is  a 

lower  triangular  matrix  containing  as  rows  the  coefficient  vectors 
of  b‘(»L  Due  to  Us  orthogonality  properly  Ba  Br'  » Ra  , the  n by 
n Toeplitz  matrix  associated  with  the  (Z-transformed)  correlation 
function  RU)  “ l/lal*)!1 . Rm  is  also  the  steady  state  covariance 
matrix  of  the  controller  form  Rm',  if  u(x)  the  input  is  a white 
process.  Now,  since  the  similarity  transform  matrix  S from  one 
state  space  form  to  an  other  is  given  for  instance  by  the  ratio  of 
the  controllability  matrices,  it  is  clear  that  $ - C*  (C')'1  ■ B„  - 
i.e.  from  an  arbitrary  (controllable)  stale  space  form  the 
similarity  transform  is  given  by  the  inverse  square-root  of  the 
steady  state  covariance.  This  leads  Finally  to  the  connection  with 
Lyapunov  equation  type  characterization  of  the  ladder  forms, 
namely  that  their  covariances  are  an  Identity  (or  diagonal)  which  it 
precisely  the  characteristic  properly  of  Schwarz  matrices,  see 
[AJMl,  the  stale  space  feedback  matrix  of  ladder  forms  (Mol  Due 
to  limitations  we  defer  a more  detailed  discussion  of  the  details 
end  various  extensions  to  [Mil 

ill.  LS-Recursions  for  Ladder  Forms 
The  Prewindowing  Case 

In  [MLVKl  we  presented  this  case,  for  completeness  and  in 
order  to  correct  some  typographical  errors  we  repeat  some  of  the 
equations  here.  Civen  a series  of  observations  (yfilOsiiT),  where 
| y{.) ) can  be  m vectors,  we  wish  to  find  the  least-squares 
one-step  predictor  of  order  p parametrized  by  the  (matrix) 
coefficients  lArjli),i“lr.*p) . We  can  define  many  different 
squared  error  criteria  £>p  for  instance  as  a function  of  s and  / in 

E*r  " £ *VlW,eJW  • *«  " A'rJ  Yln-pl . 

A'^j  - 11/^1-^],  y'lct-p)  - (y/  . y.V&IM) 

An  obvious  choice  from  an  innovations  point-of-view  is  (i-0,/-r), 
the  "pre-windowing"  case  (MDKVl  If  a - p and  / - T the 
so-called  "covariance"  method  is  obtained  [MDKV],  [MVL],  and  if  s 
■ 0 and  / « T ♦ p we  get  the  pre-  and  post-windowed  case  or  the 
"correlation"  method  [MCJ.  The  total  squared  error  can  be 
expressed  as 

EeX  * u * A'eX  ReX  AeX  J • R,J  " YpJY',J> 

- [ y[0:-P) , ylh-p.l) ylT:T-p]  ] (111-2) 

Thus  the  problem  of  determining  AfT  by  minimizing  EpT  leads  to 

RpXApT  ’ • °>  • •°)'  • uB^p,t  " m|"  E,,r  dd-3) 

Although  is  not  Toplitz,  it  still  carries  a certain 

shift-invariance  structure,  given  by  the  following  identities 


«,T  - B.J.,  • y[TT-P)  yir.r-p)’  <*«**> 

* * 1 " f'W  *1 
l'  R,-tx-t\  l * ' ‘J- 

Define  llie  backward  predictor  lifT  and  the  smoothing  errors  c*r 

BVr *,.r  - [0 ,,•,«>):  CpTBpT  : WT-pW-U 

Then  the  forward  and  backward  prediction  errors  (innovations), 

, and  rfT  , and  an  auxiliary  scalar  1 p T can  be  defined  by 

[‘VprVr'Vl  : ytTST-iaJM^.f^.C^,.) 

Order  Update  Recursions 

Using  the  three  shift-invariance  identities  for  V (lll-S)  and 
using  some  symmetry  properties,  the  order  update  recursions  for 

*,X » Bf.T  • Cp.t  » R\x  *r* 

»V.,.r  " M,.ro)’  - AW  I 

B'p.t,r  “ t ®i  Bpj-i  1*  • (III*7) 

C,.t.T  " l c,r'Y  * rW**WBW 

A p.tT  - ( Uu  block  row  of  Rp.}T  ] ( A'pj,  0 ]' 

- ( 0,  1 ( />«•«  r«“>  •/  R,.\,t  Y- 

” ^,X  ’ AVur  R’rX-\  Ae*l.r 

R,,-\x  " - A,...r  R\x  AW- 

The  order  update  recursions  are  very  similar  to  the 
multivariate  version  of  the  Levinson  algorithm,  and  a similar  set  of 
recursions  for  time-update  can  also  be  obtained  (MDKVjJMcl 
Ladder  Type  Realization 

Premultiplying  the  above  equations  by  y[T:T-p*l]  , we  obtain 
the  following  order  update  recursions  for  *^r , rpT , tpj 


Vt.r  ’ 

*eX 

A'p»1.r  rrX-t 

ryt.T  ” 

rnX- 1 

A,.i,TJ,‘p,rVr 

Vi.r  “ 

Vr  * 

rVi.r  R%-ix  rp-tx  ■ 

The  "Kalman  rain"  A . r is  obtained  from  (lll-S), (III-T) 
MJMVII™ 

A,.,  j.i  - *,:.T  • Vt'Vt-.  / 1i-V..7OI|-1,> 

and  the  reflection  or  PARCOR  coefficients  are  obtained  by 

r : A«\i,r  RAtj  * r : 

The  initial  conditions  are  given  by 

V “ r».T  * *T  ' *-1 x " ° J 

r 


S.r 

" r»,T 

■ R\j 

-Ts 

%.r 

" *rjs 

R»  r 
eX 

" *r.rs 

Ap.lU>.| 

At  lK«  duU  to  th«  tiochttlic  forms  in  [IS],  [WsV],  (Mo],  (SKMl> 
equations  (III-8MIII-11)  are  a complete  set  of  order  and  a me 
update  recursions  to  obtain  the  exact  least-squares  ladder  form 
predictor,  which  is  shown  in  Figure  1. 
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Figure  L Ladder  realization  of  exact  one-step  least-squares 
predictor. 


a'y,  - b,'ut 

* beu,. 

where 

■’  - [>.«,. 

• • • * “•  J * 

y:  - 1 », 

Y,-«  J • 

I0.fr,. 

....  bj. 

ui*  * it*,.-... 

Y.  -»  J ■ 

Now  consider  the  following  augmented  equation 


( e, ' is  the  first  unit  vector).  This  equation  can  be  interpreted  as 
an  AR  model  for  the  joint  process  { y,  U } [Mu],  since  the  RHS 
is  equal  to  the  joint  innovations  of  { y,  U } , since 


The  recursion  (111-10)  computes  the  sample  cross-covariance  of 
the  forward  and  backward  innovations,  using  the  optimal 
weighting  1/(1 -If.,.),  compared  to  other  suboptimal  schemes  tSVl 
See  in  the  appendix  a sample  comparison  of  the  exact  versus  two 
approximate  methods.  In  the  scalar  case  H>r>0  if  y#-0,  or  in 
general  if  since  OsY^ySl  [MDKV],  [MVL1  If  m>l,  we 

require  Tip»m.  These  singularities  can  be  avoided  by  including  a 
priori  estimates  of  the  covariance  It,,  or  equivalently  including  a 

weighted  norm  of  the  predictor  a„  in  the  error  criteria  E r for 
tracking  of  time-varying  parameters,  e.g.  in  speech  modeling 
methods,  these  equations  can  be  modified,  either  by  puling  an 
exponential  weight  on  past  errors  as  discussed  in  [MKLj, 
(implemented  in  the  simulation  in  the  appendix).  Alternatively,  the 
lower  bound  of  the  error  criterion  in  (III-l)  can  be  increased,  e.g.  a 
" T - / , where  / is  the  (constant)  "sliding"  time  frame  width  of 
the  analysis.  This  corresponds  also  to  a sliding  window  on  the 
prediction  errors.  The  resulting  equations  are  similar  to  the  ones 
in  [MVLl 

Instead  of  computing  the  scalars  Y one  can  also  work  with  a 
second  set  of  prediction  errors  based  on  the  "old"  parameter 
estimates,  since 

- Vr.i  / < 1 - V,.r » • 

This  alternate  form  was  also  found  by  J.  Baker,  IBM  Yorktown 
(private  communication).  A similar  situation  occurs  in  the  Fast 
Cholesky  (least-squares)  algorithms  for  estimating  moving-average 
parameters^  via  feedback  filters  described  in  [Mo],  where  a "second 
Filter"  or  "predictor  filter"  appears  that  computes  variables  of  the 
»ype  «,tr(T*l)  . It  is  interesting  to  note  in  this  context,  that  the 
unwindowed  ("covariance")  method  actually  also  leads  to  signal 
feedback  paths  (actually  a smoothing  filter),  see  [MVLl  but  the 
simpler  prewindowing  case  is  feed  forward  only. 

Many  modifications  have  been  proven  useful  in  actual 
implementations,  they  are  partially  due  to  the  fact  that  many 
additional  identities  exist  and  others  are  due  to  differences  in 
numerical  behavior  and  trade-offs  in-  operations  count  and  memory 
requirements.  Systematic  experiments  are  now  in  progress  and  will 
be  reported  on  shortly. 

IV.  LS-Recursions  for  Rational  Ladder  Forms 
Rational  or  ARMA  Modeling 

Rational  or  pole-zero  or  ARMA  modeling  methods  were 
described  in  [SLCl  [MKL]  and  their  relation  to  joint  innovations 
representation  via  an  imbedding  of  the  ARMA  model  in  a two  (m) 
channel  AR  model  in  [MLNV]  and  [Mo]  The  same  idea  also  leads  to 
stable  partial  minimal  realizations  of  the  joint  impulse-response  and 
covariance-matching  type  [MLNV]  Civen  an  ARMA  model  as 
represented  by  the  difference  equation  we  can  rewrite  it  as 

y,  • • ••  - »,  - ...  - V*.  • «v-*) 


- 

y. 

- 

v*.l 

UJ 

(IV-3) 

i - j 

Stochastic  Case 

From  a stochastic  process  point  of  view  we  can  express  the 
normal  equation  associated  with  the  augmented  AR  model  as 

e ( kit  y;  «v  J)  [ » o]j  - r.  ( fy.lt  «,*, «,  jj  i 

J M w 

«.  r:  l[a  °1  • f*M’  *.‘.1 

r«  *■  Jr1 'il  [•»*»  (,v'4) 

We  can  solve  for  the  normal  equation  of  a,  : 


S a -IR  -T'T  ) a - [Ti  'H  la  - e.R*  (I.V-S) 

The  equations  CIV-41  and  therefore  the  non-Tbplitz  equations 
(IV-&)  (!)  can  be  solved  recursively  with  the  LWR  algorithm. 
Note  that  if  /?*,  » 0 , the  minimal  order  at  • Jfc.  We  could  bring 
equations  (IV-4)  into  a more  familiar  form  by  the  interleaving 
permutation  (1,3,5,-., 2n-l, 2,4,6, ..,2n*2),  cf.  [MDKVl  to  convert  the 
two-process  covariance  matrix  into  a n by  n block  Toplilz  matrix, 
with  2 by  2 blocks,  however  the  LWR  algorithm  clearly  applies  to 
both  representations  with  suitable  modifications. 

Thus  we  have  shown  that  the  joint  impulse  response  & 
covariance  matching  problem  is  equivalent  to  solving  a set  of 
normal  equations  associated  with  a two  channel  AR  modeling 
problem.  Since  the  predictor  for  the  joint  process  is  triangular 
and  minimum  phase,  the  denominator  »t  of  the  underlying 
ARMA  model  is  also  minimum  phase  and  therefore  stable,  (for 
all  k). 

Equations  (IV-4)  and  the  elegant  stability  proof  were  actually 
first  obtained  by  Claerbout  [Clal]  via  a least-squares  rational 
approximation.  The  connections  between  the  joint  innovations 
representation,  the  augmented  normal  equations,  and  the  Henkel 
matrix  were  pointed  out  in  [Mo]  and  also  in  [MDkV],  [MKD], 
[DKMl  where  algorithms  were  given  to  solve  equations  of  the 
type  seen  in  (IV-4)  and  (IV-S). 


Deterministic  Case 

In  [MLNV]  we  considered  the  deterministic  ease  where  we  are 
given  impulse  response  data  or  the  Markov  parameters,  here  we 
shall  assume  that  we  are  given  a series  or  observations  and  we 
want  to  find  a least-squares  (deterministic)  one-step  ARMA 
predictor  recursively  from  the  data  equivalent  to  the  RML 
algorithms  described  in  [SLC]  and  [MKL]  Our  approach  will  not 
give  a new  way  how  to  derive  these  algorithms,  but  il  will  also 
give  us  very  quickly  the  ladder  forms. 

Writing  the  input/output  relationship  in  matrix  notation  yields 


B3 


(IV-S) 


where  T-T  lower-triangular  and  Hj.  is  a full  matrix,  but  both  are  a 
product  or  two  Toplitz  matrices  containing  the  data  and  the 
normalized  one-step  prediction  errors,  which  take  place  of  the 
inputs  u - Rt/l  ( , « - y,  • ylM  , where  - EkT  in 

«.  ■ W ■ fy.  - >.1.-1 


V. 


“.ii-i 


Recursions  for  Rational  Ladder  Forms 


This  partitioning  leads  easily  to  the  rational  ladder  recursions. 
Formally  the  same  recursions  can  be  used.  However,  the  fact  that 
the  forward  predictor  is  triangular  simplifies  and  actually  makes 
the  recursions  possible  at  all,  since  the  one-step  prediction  errors 
are  not  needed  until  the  next  iteration,  i.e.  they  are  feed  back 
and  treated  at  the  next  time  step  as  the  ("other  half"  cf  the) 
observations.  It  is  easily  verified  in  the  same  context  that  half  of 
the  entries  in  T are  zero,  which  guarantees  that  the  one-step 
prediction  errors  are  not  used  before  they  are  required  in  the 


figure  2.  Rational  Ladder  realization  o/  exact  one-ztep 
least- squares  predictor. 


Appendix:  Computer  Simulations 

Layered  Media  Identification 

The  Modeling  un  of  liyeded  media  is  of  interest  in  many  ireis, 
notably  in  Ceophysics,  see  e.g.  Claerboul  [Clal,2]  and  more 
recently  in  medical  imaging  or  nondestructive  testing.  There  are 
two  basic  situations  that  occur  in  these  areas.  The  first  one, 
where  the  source  is  on  the  opposite  side  of  the  receiver  is  the 
straight  forward  case,  it  leads  to  autoregressive  or  all-pole  models, 
wich  can  be  readily  identified  by  using  the  various  methods  to 
estimate  reflection  coefficients  by  cross-correlation  of  the 
forward  and  backward  residuals  in  the  whitening  filter  in  ladder 
form,  see  e.g.  (Cl a 1,2).  The  second  case,  where  the  source  and 
receiver  are  on  the  same  side  did  up  to  now  not  lead  to  such 
simple  processing  as  the  first  case,  because  the  (imput)  transfer 
function  is  rational  in  general,  or  in  the  best  case  where  a total 
reflection  occurs  within  some  layer  the  transfer  function  is  an 
all-pass  network.  In  this  case  the  zeros  are  equal  to  the  reflected 
poles  and  the  ’reflection  coefficients’  of  the  numerator  polynomial 


are  the  negative  of  the  ones  of  the  denominator  polynomial  of  the 
transfer  function.  We  can  readily  see  then,  that  our  rational 
ladder  form  specializes  and  we  get  only  one  set  of  reflection 
coefficients  that  can  be  associated  with  the  ones  of  the  layered 
medium  that  generated  the  data.  This  particular  case  is  treated 
from  a circuit  point  of  view  by  Kung  (Kunl  Figure  3 shows  an 
example  using  real  ultrasound  returns  and  the  estimates  of  the 
reflection  coefficients  using  a ladder  structure. 


Figure  3:  Identification  of  a layered  media  via  ultrasound . 

The  experiments  were  performed  by  Linda  Joint  and  Dough  Boyd 
in  the  Stanford  Electronics  Laboratory  of  Prof.  J.  Meindl.  The 
reflection  coefficient  estimates  appear  to  be  much  smaller  than 
anticipated  from  the  experimental  set  up,  this  it  due  to  several 
factors:  The  ladder  structure  actually  identifies  not  only  the 
medium  and  the  single  reflecting  plate  in  the  path  of  the  ultrasound 
beam,  but  also  computes  an  equivalent  layer  model  for  the 
transducer.  The  estimated  values  of  the  firtt  large  reflection 
coefficients  show,  that  the  transducer  is  very  inefficient  and  not 
very  well  matched  because  the  largest  value  is  very  close  to  one, 
which  tend  to  "turn  off"  all  higher  order  reflection  coefficients. 
Further  more,  because  of  the  wavelengs  used  the  layers  of  the 
medium  have  a continuous  reflection  coefficient  density  which 
indicates  that  this  direct  scheme  must  fail  since  we  tried  to 
estimate  the  derivative  of  a function  (with  noisy  data!)  It  would 
require  the  use  of  a modified  ladder  form  that  is  parameterized  by 
the  equivalent  of  the  "area  function"  used  for  instance  in  the 
speech  modeling  context  (Wak). 

Sample  Comparison  of  Different  Reflection  Coefficient  Estimates 
Ladder  coefficient  estimates  were  in  the  past  said  to  converge 
very  slowly,  indeed  this  is  the  case  for  approximate  recursive 
methods  as  demonstrated  in  the  Figure  4,  where  three  methods  are 
compared.  Two  approximate  recursive  methods  using  arithmetic 
mean  definitions  of  the  prediction  error  (see  e.g.  (Male),  (MLVK)i 
and  and  other  computational  attractive  method  using  the  average 
of  the  product  of  the  signs  of  the  forward  and  backward 
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prrdictior  error,  which  arises  from  LI  norm  considerations  see 
Claerboui[Cli3]  and  has  often  been  used  in  circuit  design. 


Figure  4:  Companion  of  Two  Approximate  and  One  Exact 
Recursive  Method. 


The  third  method  uses  our  recursive  exact  least-squares  equations 
(pre-windowed  case).  The  first  two  schemes  give  very  similar 
results,  i.e.  a bias  of  50%  on  the  only  nonzero  third  reflection 
coefficient  (-.8)  and  sidelobes  as  high  as  20%  and  10%  typical  value, 
whereas  our  exact  method  has  virtualy  no  bias  and  half  the  maximal 
and  typical  sidelobe  values.  Furthermore,  they  actually  converged 
already  after  around  30  samples  compared  with  the  200  samples 
used  in  Figure  4 . We  may  note  that  the  other  schemes  took  much 
longer  to  actually  converge. 
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BEGIN  "speech* 


APPENDIX  C 


I 


comment 

This  Is  a complete  analysis  program  that  compuute 
reflection  coefficients,  pitch,  and  energy  Information 
as  each  new  data  Is  sampled.  The  analysis  Is  done 
Independent  of  transmission  frame  size.  It  resets  to 
start  state  whenever  silence  or  pause'  Is  encountered. 

The  pitch  detection  scheme  Is  based  on  testing  the  log 
likelihood  ratios  at  each  sample  and  there  Is  no  time  delay 
in  the  transmission. 

The  main  program  first  calls  FILEOPEN  to  open  necessaary  files. 

Then  It  calls  LADDER  which  perform  time  update  operations. 

LADDER  calls  the  following  procedures: 

INITIALIZE  for  Initialization  of  global  variables, 

TRINIT  for  Initialization  of  transmission  strategy, 

PITCHINIT  for  Initialization  of  pitch  detector, 

ORDERUPDATE  for  update  of  ladder  variables, 

RESET  when  silence  or  bad  numerical  condition  Is  encountered, 
PITCHDETECT  for  pitch  detection, 

TRANSMITTER  for  transmitting  reflection  coefficients,  pitch, 
and  energy  Information  at  each  transmission  frame 
boundary, 

FILECLOSE  Is  called  at  the  end  of  Input  file; 


p pvj  1 1> ..  i ^ i 


, 

i f 

i i 


i 


require  amsa11m.sa1[exp,1ee]a  source_f11e; 
require  amsa11p.sa1[exp,1ee]a  loadjnodule; 

# Complletlme  definitions  of  speech  analysis  constants; 

define  HAXP  = 30;  # max  order  supported; 

define  HAXPP  =6;  # number  of  pitch  pulses/  frame; 

# Comp  1 let imme  definitions  of  I/O  buffer  sizes,  the  values  used 

# here  are  Important  only  from  efficiency  considerations; 

# number  of  speech  samples  In  core  buffers; 

# In  core  transmission  values; 

# history  registers; 

# buffer  for  debug  file; 


# pointers  for  Input  files; 

4 pointers  for  output  files; 

# pointers  for  debug  data  files; 


# first  sample  and  total  number  of  speech  samples; 

# general  purpose  loop  counters; 


define  BUFSIZE  = 4896; 
define  OBUFSIZE  = 128; 
define  RBSIZE  = 128; 
define  BUGBUF  = 4896; 

# Housekeeping  variables; 

Integer  fpy; 

Integer  fko,  fpo,  feo; 
Integer  fbugl,  fbug2,  fbug3, 
fbug4,  bt; 

Integer  Fsample,  Nsamples; 
Integer  1,  J; 


# Analysis  ladder  form  parameters  and  data  storage; 


integer  t,st,tt,tm1npmax,1st;  # 

Integer  tau;  # 

Integer  p,  pmax;  # 

real  ttau; 

real  delta;  # 

real  tl,  til,  t8tl,  tlt8;  # 

real  resetsup.resetlnf ; # 

real  array  # 

e,  eZ,  r,  rZ, 

D,  K,  Ke,  Kr, 

Re,  Rr,  RrZ  [8  tc  HAXP]; 

real  array  g,  gZ[-l  tc  MAXP];  • 
real  yt;  . # current  Input; 


time  Indices; 

time  constant  for  weighting; 
parameter  order  counters; 

prior  value  of  covariance; 

1/t,  l/(t+l),  t/(t+l),  (t+l)/t  ; 
upper  and  lower  reset  threshold; 
variables  of  analysis  ladder; 


# Data  buffering  considerations: 

# Data  buffer  management  Is  handled  Independently  from  the 

# analysis.  The  particular  sizes  of  the  following  buffers  are  only 

# Important  In  that  particular  values  allow  efficient  operation 

# of  the  SU~AI  disk  system; 


real  array  y[8  tc  BUFSIZE-1];  # Input  data  buffer; 

# output  buffers  for  transmission  parameters; 

# the  first  Index  changes  per  frame,  the  second  Index  represents  the 

# number  of  parameters  of  that  type  per  frame; 

real  array  # reflection  coefficients; 

okb[8  tc  OBUFSIZE, 1 tc  HAXP]; 

real  array  # pitch  Information; 

op1tchb[8  tc  OBUFSIZE, 1 tc  HAXPP]; 

real  array  # energy  Information; 

oenergyb[8  tc  OBUFSIZE]; 

real  array  # debug  data  buffers; 

bugl, 
bug2. 
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bug3, 

bug4[8  tc  BUGBUF]; 

# Buffer  management  variables: 

# In  general,  whenever  a buffer  empties  or  fills  up,  at  that 

# point,  an  I/O  call  Is  made  to  refill  or  drain  It; 

Integer  obp;  # output  buffer  pointer; 

Integer  kbslze,  pbslze;  # multiples  of  OBUFSIZE; 

# Transmlslon  strategy  variables; 

boolean  pp,lr,rs;  # pltchpulse,  lower  reset,  reset; 

Integer  frameslze;  # number  of  samples  per  transmission; 

real  Energy;  # per  frame  energy  of  speech  samples; 

Integer  array  p1[l  tc  HAXPP];  # pitch  position  Indicator; 

Integer  ppptr;  # pointer  Into  pi  array; 

# These  following  variables  are  ring  buffer  management  varlalbes; 

# RBSIZE  Is  ring  buffer  size,  should  be  the  largest  pre-deadzone; 

# (The  transmission  module  maintains  recent  history  of  the 

# reflection  coefficients); 

real  array 

rkb[0  tc  RBSIZE- 1.1  tc  MAXP]; 
integer  rbp.trp;  # buffer  counters; 

# dead  zone  sizes  and  related  variables  for  transmission; 

# pre  and  post  deadzones  for  pitch  pulses,  lower  threshold  resets 

# and  upper  threshold  resets; 

# Variables  nextXX,  nextXXs,  otdel,  and  dzcount  reflect  Implementation 

# details  of  the  deadzone  strategy  rather  than  algorithmic  details; 

integer  pppredz , pppostdz , r spredz , r spostdz , 1 r pr edz , 1 rpostdz ; 

Integer  pptotdz.rstotdz, Irtotdz, otdel ; 

Integer  nextpp,nextlr,nextrs; 

Integer  nextpps,nextlrs,nextrss; 

Integer  dzcount; 


# pitch  detector  variables; 

real  dglnf,  dgmax,  taup,  rho,  oldmaxpt,  maxpt,  plnf,  pt,  oldpt,  dg,  alphap 
Integer  deadzp,  nextp,  sgnp,  lastptlme,  ppwlndow; 


# These  procedures  are  largely  self-explanatory.  At  the  beginning 

# of  processing,  ell  the  required  files  are  opened,  at  the  end 

# they  are  closed; 

procedure  flleOpen; 

c ■flleOpen*  # Create  files; 

fpy  «-  open  ("speech  Input  file:  ■,  DATA  ! INPUT  I PROMPT  ); 

fko  *r  open  ("kkk.dat",  DATA  1 CREATE  I OUTPUT  ); 

fpo  «■  open  ("plt.dat",  DATA  I CREATE  ! OUTPUT  ); 

feo  «•  open  ("en.dat",  DATA  ! CREATE  ! OUTPUT  ); 

fbugl  «-  open  ("bugl.dat";  DATA  ! CREAfE  ! OUTPUT  ); 

fbugZ  - open  ("bug2.dat",  DATA  ! CREATE  ! OUTPUT  ); 

fbug3  - open  < "bug3.dat",  DATA  ! CREATE  ! OUTPUT  ); 

fbug4  - open  (■bug4.dat",  DATA  I CREATE  ! OUTPUT  ); 

a "flleOpen"  ; 

procedure  flleClose; 
c "flleClose" 

close(fpy); 

close(fko); 

. close(fpo); 
close(feo); 
close( fbugl); 

ttyWr1te(  "BUG1.DAT  contains  predetection  pitch  postltlon",  nl); 
c1ose(fbug2); 

ttyWrite(  "BUG2.DAT  contains  postdetection  pitch  postltlon",  nl); 
c1ose(fbug3); 

ttyWr1te(  "BUG3.DAT  contains  dg",  nl); 
close(fbug4); 

ttyWr1te(  "BUG4.DAT  contains  Innovations",  nl); 

a "flleClose"; 


procedure  Initialize; 
c •initialize* 

# Initialize  analysis  algorithm  variables; 

ttyWr1te'( "Initialization  of  tracking  parameters:",  nl); 
ttyWrite("  tau  (160)  = "); 

read(ttyRead,  tau); 

ttyWr1te("  Prior  covariance  (0.00001)  = "); 

read(ttyRead,  delta); 

ttyWr1te("  upper  reset  threshold  on  g[pmax]  (0.99)  * 
read(ttyRead,  resetsup  ); 

ttyWr1te("  lower  reset  threshold  on  g[pmax]  (0.0001) 
read(ttyRead,  resetlnf  ); 
for  1 «-  0 upto  pmax  do 

c e[1]  r[1]  «-  rZ[1]  *■  0; 

D[ 1 3 - Ke[ 1 3 - Kr[1]  - K[1]  - 0; 

Re[1]  - Rr[1]  ••  RrZ[1)  «-  0; 

' oC  1 3 •*  qZC  1 3 0; 

ot-i]  - oZC-l]  - 0; 

Re[0]  «-  Rr[0]  «-  delta  ; 

st  «-  0;  # set  reset  pointer  to  0; 

1st  «-  -20;  # printout  interval  for  last  reset; 

# Initialize  data  buffering; 

# briefly,  the  Idea  Is  to  break  up  Nsamples  Into  groups; 

# of  no  more  than  BUFSIZE,  reading  and  writing  Is  done  on 

# these  smaller  segments; 

# buffer  management,  Initialize  pointers; 

bt  0;  # pointer  for  debug  file; 

rbp  •-  0; 

obp  «-  0; 
trp  «*  0; 

kbslze  «-  OBUFSIZE  * MAXP; 
pbslze  ••  OBUFSIZE  • MAXPP; 
for  1 «-  0 upto  RBSIZE-1  do 

for  J 1 upto  MAXP  do 
rkb[1,J]  ••  0; 
for  1 «-  0 upto  OBUFSIZE- 1 do 
c for  j + 1 upto  MAXP  do 
okb[1,J]  0; 
for  J «-  1 upto  MAXPP  do 

op1tchb[1,J]  -1; 
oenergyb[1]  *■  0; 

a "Initialize"; 


procedure  reset; 
c "reset" 

# This  procedure  Is  nearly  equlvllent  to  the  analysis 

# algorithm  parts  of  the  Initialize  procedure  above; 

If  (t  - 1st)  geq  20  then 

c If  rs  then  ttyWrlte  ("Upper  Reset  at  t = ",  t,  nl) 

else  ttyWrlte  ("Lower  Reset  at  t = ",  t,  nl); 
1st  «-  t; 

=•; 

st  «-  -1; 

for  1 «•  8 upto  pmax  do 

c D[1)  «•  Ke[1J  - ICr[1]  - K[1]  «-  O; 

Re[1>  Rr[1]  - RrZ[1)  *-  0; 
e[1]  •>  r[1]  *•  rZ[1]  *•  0; 
fl[1]  «■  0Z[i]  - 0; 

Re[0]  Rr[0]  *■  delta; 

a "reset"; 


procedure  trlnlt; 
c "trlnlt* 

# Initialize  transmission  strategy  variables; 

Energy  «-  8; 

ttyWrite  ("Initialization  of  transmitter  parameters:",  nl); 
ttyWrite  ( " Transmission  frameslze  = "); 
read  (ttyRead,  frameslze); 
ttyWrite  ( " pp  pre  dz  (3)=  ■); 
read  ( ttyRead,  pppredz  ); 
ttyWrite  ( • pp  post  dz  (28)=  "); 
read  ( ttyRead,  pppostdz  ); 
ttyWrite  ( " low  reset  pre  dz  (8)=  *); 
read  ( ttyRead,  Irpredz  ); 
ttyWrite  ( ■ low  reset  post  dz  (28)=  ■); 
read  ( ttyRead,  Irpostdz  ); 
ttyWrite  ( ■ reset  pre  dz  (pmax)=  "); 
read  ( ttyRead,  rspredz  ); 
ttyWrite  ( • reset  post  dz  (28)=  "); 
read  ( ttyRead,  rspostdz  ); 
pptotdz  *•  pppredz  + pppostdz; 
lrtotdz  «-  Irpredz  + Irpostdz; 
rstotdz  «•  rspredz  + rspostdz; 
otdel  «-  pppredz  max  ( Irpredz  max  rspredz  ); 
nextpps  «■  otdel  - pppredz  + 1; 
nextlrs  «-  otdel  - Irpredz  + 1; 
nextrss  «•  otdel  - rspredz  + 1; 
a "trlnlt"; 


procedure  pltchlnlt 
c "pltchlnlt* 


# Initialize  pitch  detector  variables; 

ttyWrlte  ( "Initialization  of  pitch  detector  parameters:*,nl ) ; 
ttyWrlte  ( " post  detect  deadzone  = ( 30)  "); 
read  ( ttyRead,  deadzp  ); 

ttyWrlte  ( " scanner  time  constant  taup  = (58)  "); 
read  ( ttyRead,  taup  ); 

ttyWrlte  ( " scanner  upperthreshold  factor  = (3)  ■); 
read  ( ttyRead,  alphap); 

ttyWrlte  ( ■ pre  scanner  window  = ( 10)  "); 
read  ( ttyRead,  ppwlndow  ); 

ttyWrlte  ( * scanner  lower  threshold  Pinf  = ( 0.003)  "); 

read  ( ttyRead,  pinf  ); 

ttyWrlte  ( " Inf  for  da  (0.01)=  " ); 

read  ( ttyRead,  dglnf  ); 

ttyWrlte  ( " norm  for  dgmaxp  (0.25)=  " ); 

read  ( ttyRead,  dgmax  ); 

oldpt  *•  0; 

oldmaxpt  «-  maxpt  ♦>  pinf; 

rho  «-  exp  ( -1/taup); 

lastptime  «-  nextp  «■  0; 

for  1 1 upto  NAXPP  do  p1[1J  «■  -1; 

ppptr  «-  O; 


s "pltchlnlt"; 


procedure  orderUpdate; 
c "orderUpdate" 


i See  algorithm  descriptions  for  an  explanation  of  this  procedure; 

i order  updates  the  Innovations; 
for  p «-  0 upto  ( tmlnpmax  - 1 ) do 
c "stepupOrder* 

# Update  the  partial  correlations; 

If  tI*gZ[p-l]  geq  1.  then  # numerical  conditioning  control 

c ttyWr1te("gZ[\  p-1,  "]  >=  ",  l./tl,  "at  t » t,  NL); 

gZ[p-l]  - 0.99/tI  ; 
a; 

D[p+1]  - D[p+1]  + tlI*(rZ[p]*e[p]/(l  - tI*gZ[p-l])  - D[p+1]); 

g[p]  - g[p-l]  + r[p]  * r[p]  / Rr[p]; 

Ke[p+1]  «•  D[p+1]  / Re[p]; 

Kr[p+1]  - tlt0*E»[p*1]  / RrZ[p]; 

K [p+1]  - If  ( OLP+i]  < 0 ) 

then  - sqrt(  abs(  Ke[p+l]*Kr[p+l])) 
else  sqrt(  abs(  Ke[p+l]*Kr[p+l]) ) ; 

• 

# Update  the  prediction  errors  and  their  covariances; 

r[p+l]  *■  rZ[p]  - Ke[p+1]  * e[p]; 
e[p+l]  «•  e[p]  - Kr[p+1]  * rZ[p]; 

If  p * { st  • 1 ) then 

c # Order  update  on  Initial  covariances; 

Re[p+1]  «-  Re[p]  - Kr[p+1]  * D[p+1]; 

Rr[p+1]  «■  t8tl*RrZ[p]  - Ke[p+l]*D[p+l]; 

else 

c # Time  update  on  subsequent  covariances; 

Re[p+1]  «-  Retp+1]  + tll»(  e[p+l>e[p+l]/( l-tI*gZ[p])  - Re[p+1]  ) 
Rr[p+1]  «-  Rr[p+1]  + tll»(  r[p+l]*r[p+l]/(l-tll*g[p])  - Rr[p+1]  ) 


=»; 

a "stepupOrder"; 

# Update  g[ tmlnpmax]; 

g[ tmlnpmax]  «-  g[tm1npm«x-l]  ♦ r[tm1npmax]*r[tm1npmax]/Rr[ tmlnpmax]; 


s "orderUpdate"; 


procedure  pltchdetect; 
c "pltchdetect* 

# compute  differential  likelihood  ratio; 

dg  •-  ( tll*g[ tmlnpmax]  . tI*gZ[tm1npmax])/dgmax; 

if  dg  < dglnf  then  dg  «*  0; 

If  dg  neq  0 

then  # extract  nongausslan  pulse; 

c If  abs(eZ[tm1npmax])  > abs(e[tm1npmax]) 
then  pt  «-  eZ[tm1npmax] 
else  pt  «•  e[tm1npmax]; 

else  pt  - 6; 

If  pt  < 0 then  # Invert  negative  pulse; 
c pt  - pt  / 2; 
sgnp  «-  -1;. 

else  sgnp  «■  1; 
bug3[bt]  «•  pt  * sgnp; 


# start  pitch  scanner; 

If  t > nextp 

then  # start  exponential  scanner; 
c raaxpt  «-  ( maxpt  * rho  ) max  plnf; 

If  pt  > maxpt  then  # hit  first  pitch  pulse; 
c # set  new  scanner  threshold; 

maxpt  «-  pt  min  ( alphap  * oldmaxpt  ); 

oldmaxpt  «-  maxpt; 

oldpt  *■  pt; 

lastptlme  *■  t; 

nextp  *■  t + deadzp; 

pp  *■  TRUE; 

=>; 

3 

else  If  ( (t  - lastptlme)  leq  ppwlndow  ) and  ( pt  > 2 * oldpt  ) 

then  c # encounter  2nd  higher  pulse,  restart  scanner 
nextp  *■  t ♦ deadzp; 
maxpt  «-  pt  min  ( alphap  * oldmaxpt  ); 
oldmaxpt  «-  maxpt; 
oldpt  •-  pt; 

# discard  and  update  current  pitch  location; 
plCppptr  max  1]  «-  t mod  FRAMESIZE; 

3 

else  pp  •-  FALSE; 

# output  debug  variables; 
bug4[bt]  *•  e[  tmlnpmax]; 
bug2[bt]  «-  dg; 

If  pp  then  bugl[bt]  «-  pt  * sgnp  else  bugl[bt]  <-  B; 
bt  «■  ( bt  ♦ 1 ) mod  BUGBUF ; 

If  bt  * 8 then 

c aryWr1te(  fbugl,  bugl,  BUGBUF  ); 

aryWr1te(  fbug2,  bug2,  BUGBUF  ); 
aryWr1te(  fbug3,  bug3,  BUGBUF  ); 
aryWr1te(  fbug4,  bug4,  BUGBUF  ); 


3 "pltchdetect"; 


f 

■ 


procedure  transmitter; 
c "transmitter" 

# This  procedure  Implements  the  transmission  strategy; 


# mark  pitch  positions; 

If  pp  then 

c ppptr  *■  ppptr  +1; 
if  ppptr  leq  MAXPP 

then  p1[ ppptr]  «•  t mod  FRAMESIZE; 

=>; 

# compute  total  Input  energies  since  last  transmission; 

.Energy  «-  Energy  + e[0]»2; 

# save  ref  coeffs  In  ring  buffers; 

rbp  «•  (rbp  + 1)  HOD  RBSIZE; 
for  1 «-  1 upto  pmax  do 

. rkb[rbp,1]  «■  K[1]j 

# This  section  of  code  Implements  the  strategy  for 

# transmitting  "reeasonable"  reflection  coefficients; 

# calculate  transmit  pointer  In  ring  buffer; 

If  pp  then  nextpp  *■  nextpps;  # const!; 

If  lr  then  nextlr  «-  nextlrs;  # const!; 

If  rs  then  nextrs  *-  nextrss;  # const!; 

If  nextpp  = 1 then  dzcount  «-  dzcount  max  pptotdz; 

If  nextlr  * 1 then  dzcount  *■  dzcount  max  Irtotdz; 

If  nextrs  s 1 then  dzcount  «-  dzcount  max  rstotdz; 

If  nextpp  > 8 then  nextpp  •*  nextpp  - 1; 

If  nextlr  > 0 then  nextlr  nextlr  - 1; 

If  nextrs  > 0 then  nextrs  «■  nextrs  - 1; 

# If  we  are  In  a deadzone,  then  leave  transmit  pointer  alone, 

# otherwise,  update  It; 

If  dzcount  > 0 

then  dzcount  «-  dzcount  - 1 

else  trp  «-  (rbp  - otdel  + RBSIZE)  MOD  RBSIZE; 


# If  we  are  at  a frame  boundary  then  transmit  parameter  vector; 
If  ((t+1)  MOD  frameslze)  • 0 then 
c "transmit" 

# move  vector  to  output  buffers; 
for  1 •*  1 upto  pmax  do 

okb[obp,1]  rkb[trp,1]; 
for  1 ••  1 upto  MAXPP  do 

opltchb[obp,1]  ♦>  pIC  1 3; 
oenergyb[obp]  Energy; 

# If  necessary,  empty  buffers; 

obp  - (obp  ♦ 1)  NOD  OBUFSIZE; 

If  obp  ■ 0 then 


c 

aryWr 1 te ( f ko , okb , kbs 1 ze ) ; 
aryWr1te(fpo,op1tchb,pbs1ze); 
aryWr 1 te{ f eo , oenergyb , OBUFSIZE ) ; 


I! 


# reset  pointers  for  new  frame; 

for  1 - 1 upto  MAXPP  do  p1[1]  «■  -1;  # reset  pitch  Indicators; 

ppptr  «•  0; 

Energy  ^ 0;  # reset  residual  energy; 

d "transmit*; 
a "transmitter"; 


0 

Lj 


procedure  ladder; 
c "ladder" 

# Perform  one-step  predictor  ladder  form; 

# Procedures  required:  Initialize,  pltchlnlt,  trlnlt; 

# orderupdate,  transmitter; 


Initialize;  # Initialize  ladder  variables; 

trlnlt;  # Initialize  transmission  strategy; 

pltchlnlt;  # Initialize  pitch  detector  variables; 

# data  prescan  to  eliminate  leading  zeros; 

# skip  all  zero  Inputs; 

1 «-  0; 

setPos  (fpy,  Fsample); 
aryRead(fpy,y,BUFSIZE) ; 
while  (yC 1 ] = 0)  do 
c 

1-  1+1; 

If  (1  mod  BUFSIZE)  ■ 0 then 

aryRead(fpy,y, BUFSIZE); 

yt  «-  y[l]; 

# check  threshold  of  first  Input  data; 

If  abs(yt)  < delta  then 

c ttyUrlte  ("  |yt»  ",yt,"|  < delta  = ", delta,", 
yt  * sgn(yt)*de1ta  ",NL); 
yt  «•  (yt/abs(yt))  * sqrt(delta); 

setpos(fpy,  Fsample+1); 

# start  recursive  ladder  form; 
tt  - 0; 

for  t «-  0 upto  Nsamples  do 
c "malnloop” 

If  (tt  mod  BUFSIZE)  > 0 
then  c tt  *-  0; 

aryRead(fpy,y, BUFSIZE) ; 

If  (t  mod  512)  > O then  ttyWrlte  (■$•); 


# set  time  min  order  Index  since  last  reset; 
tmlnpmax  *•  st  min  pmax; 

If  t * 0 then  yt  «■  y[  tt  3;  “■ 

# compute  weighting  factor; 

If  tau  « 0 

then  # time-weighted; 
c til  - l./(t  ♦ 1.); 
t0tl  t/(t  ♦ 1.); 

If  t > 0 then  c tl  l./t; 

tlt0  (t  ♦ l.)/t; 
a 

else  tl  - tlt0  - 1; 
a 

else  # exponential  weighting; 
c ttau  » ( st  + 10*pmax  ) min  tau; 


■ 

■auto.. - 


tl  - l./ttau; 
til  - l./(ttau  + 1.); 
tOtl  *■  ttau/(ttau  + 1.); 
tlt8  (ttau  ♦ l.)/ttau; 

=»; 

# Update  delayed  values; 

for  1 8 upto  pmax  do 

c rZ[1]  - rC  1 ].; 

eZ[1]  «-  e[1]; 

RrZ[ 1 ] - Rr[ 13; 
flZCI]  - bCOs 

=»; 

# start  zeroth  order  ladder; 
e[0]  «-  r[0]  - yt; 

Re[0]  ••  Rr[0]  •-  Rr[0]  + tll*(yt*yt  - Rr[0]); 

# order  update  the  ladder; 
orderUpdate; 

# test  for  reset; 

If  (tll*g[ 1]  geq  resetsup)  then  rs  *•  TRUE 

else  rs  FALSE; 

If  (tll*g[tm1npmax]  < resetlnf)  then  lr  «•  TRUE 

else  lr  «-  FALSE; 

If  rs  v lr  then  reset; 

# call  pitch  detector; 
pltchdetect; 

# Update  time  Index  since  last  reset; 
st  * st  + 1; 

tt  «-  tt  + 1; 

# call  transmitter; 
transmitter; 

s "main loop"; 
a "ladder"; 


• * HAIN  PROGRAM; 


ttyVr1te( "first  sample  « •); 
read( ttyRead,  Fsample); 
ttyWrite  ( "Nsamples  « ■); 
read  ( ttyRead,  Nsamples  ); 
j ttyWrite  ( "pmax  s •); 

( ■ read  ( ttyRead,  pmax  ); 

I flleOpen; 

ladder; 

flleClose; 


END  "speech" 

L ; 

I 


! [ I 
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II 

BEGIN  "synthesis" 


comment 

i . This  ts  a speech  synthesizer  program.  It  takes 

as  Input  vectors  transmitted  by  SPEECH  program 
and  synthesis  speech  according  to  the  frameslze 
and  order  of  filter  prescribed; 

. 

U . 

0 


> 

■I 

*j 


I I I 
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require  amsa11m.sa1[exp,1ee3*  source_file; 

require  "msa11p.sa1[exp,1ee]"  load  nodule; 

define  PMAX  > 12; 

define  OBUF  > 4096; 

define  IBUF  = 128; 

define  MAXPP  « 6; 

Integer  p,  ppmax,  Nsamples,  fr,  t,  1; 

Integer  ptcount; 

real  Re,  gnolse,  gain,  pulse,  npulse,  nu; 

integer  fpy,fpg,fpe,fpp,fpk;  # file  handles; 

Integer  bt;  # buffer  pointer; 

real  yt; 

real  array  e,  r,  rZ  [ 8 tc  PMAX  3; 

real  array  K[  1 tc  PMAX  ]; 

real  array  C[  0 tc  IBUF  , 1 tc  PMAX  ]; 

real  array  pp[  0 tc  IBUF  , 1 tc  MAXPP  ]; 

real  array  p1tch[  1 tc  MAXPP  ]; 

real  array  y[  0 tc  OBUF  ]; 

real  array  En[  0 tc  IBUF  ]; 

real  array  rannum  [ 0 tc  4800]; 

Integer  frnum,  frames Ize; 

Integer  obp; 

Integer ' cbs 1 ze , pps 1 ze ; 


procedure  Initialize; 
c "Initialize" 

ttyWrlte  ( " number  of  frames  ® " ); 
read  ( ttyRead,  frnum  ); 
ttyWrlte  ( ■ frames Ize  * * ); 
read  ( ttyRead,  frames Ize  >; 

fpk  «-  open  ( "coefficient  filename  : ",  data  I Input  ! prompt  ) 
fpp  «-  open  ( "plt.dat",  data  I Input  ); 
fpe  «*  open  ( "en.dat",  data  I Input  ); 
fpg  «•  open  ( "grand. da[sp, lee]",  data  ( Input  ); 
aryRead  ( fpg  , r annum  ,,  4089  ); 
close  ( fpg  ); 
for  1 0 upto  PMAX  do 

e[1]  «-  r[1]  8; 

yt  ♦*  0; 

Nsamples  «-  1; 

fpy  *■  open  ( "y.p36",  data  I create  ! output  ); 
obp  «-0; 

for  1 •-  0 upto  OBUF  - 1 do 

y[i]  *•  8; 

cbslze  •-  PMAX  * 1BUF; 
ppslze  4.  MAXPP  * IBUF; 


a "Initialize 


procedure  ladder; 
c "ladder" 

for  p «-  0 upto  (pmax  - 1 ) do  rZ[p]  r[p]; 
gnolse  *-  r annum  [ Nsamples  mod  4088  ]; 

If  p1tch[ 1]  = -1 
then  nu  - gnoise  * gain 
else  c nu  *■  npulse; 

for  1 ••  1 upto  MAXPP  do 

If  (t  * p1tch[1])  then  nu  *■  pulse; 

=»; 

If  Nsamples  < PMAX 
then 

for  1 *■  1 upto  Nsamples  do 

nu  *•  nu  « sqrt  ( ( 1 - K[  1 ]*K[  1 ] ) ) 

ppmax  *•  Nsamples  min  PMAX; 

e [ ppmax  ] «-  nu; 

for  p *■  ppmax  downto  1 do 

e[p-l]  «■  e[p]  + K[p]  * rZ[p-l]; 

for  p «•  1 upto  ppmax  do 

r[p]  «-  rZ[p-l]  - K[p]  * e[p-l]; 

yt  *•  r[8]  e[0]; 

y[obp]  «-  yt; 

obp  «-  (obp  ♦ 1)  mod  OBUF; 

If  obp  s 0 then 

aryWr 1 te ( f py , y , OBUF ) ; 

If  (Nsamples  mod  512)  « 0 then 
ttyWr1te(*S"); 

Nsamples  «-  Nsamples  + 1; 


a "ladder"; 


Initialize; 

aryRead(fpk,C,cbs1ze); 

aryRead( f pe , En , IBUF ) ; 

aryRead(fpp,pp,ppsize); 

bt  0; 

for  fr  «-  0 upto  frnum  - 1 do 

c # synthesis  one  frame  of  speech; 

for  1 1 upto  PMAX  do 

K[1]  - CCbt.l]; 

Re  «-  En[bt]; 

for  1 1 upto  PMAX  do 

. Re  *-  Re  * ( 1 - K[1]t2  ); 
for  1 - 1 upto  MAXPP  do 

p1tch[1]  pp[bt,  1]; 

If  p1tch[l]  = -1 

then  gain  «■  sqrt(  Re/FRAMESIZE  ) 
else 

c ptcount  «•  0; 

for  1 «-  1 upto  MAXPP  do 

If  p1tch[1]  neq  -1  then  ptcount  **  ptcount  + 1 
pulse  •-  sqrt  ( Re/ptcount  ); 
npulse  «*  0; 

=>; 

for  t «•  0 upto  frameslze  - 1 do 
ladder; 

bt  - (bt  + 1)  mod  IBUF; 

If  bt  = B 
then  c 

aryRead(fpk,C,cbsize);  ' 
aryRead( f pe , En , IBUF ) ; 
aryRead(fpp,pp,pps1ze); 

• 

close(fpy); 

close(fpk); 

close(fpe); 

close(fpp); 

END  "synthesis"; 


